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Abstract 

Regenerating codes and codes with locality are two schemes that have recently been proposed to ensure data 
collection and reliability in a distributed storage network. In a situation where one is attempting to repair a failed 
node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality 
attempt to minimize the number of helper nodes accessed. In this paper, we provide several constructions for a class 
of vector codes with locality in which the local codes are regenerating codes, that enjoy both advantages. We derive 
an upper bound on the minimum distance of this class of codes and show that the proposed constructions achieve 
this bound. The constructions include both the cases where the local regenerating codes correspond to the MSR 
as well as the MBR point on the storage-repair-bandwidth tradeoff curve of regenerating codes. Also included is a 
performance comparison of various code constructions for fixed block length and minimum distance. 



I. Introduction 

Apart from ensuring reliability, the principal goals in a distributed storage network relate to data collection and 
node repair. We will seek architectures which store the data across n nodes in such a way that a data collector can 
recover the data by connecting to a small number k of nodes in the network. Node repair will be accomplished by 
connecting to a subset of d nodes and downloading a uniform amount of data from each node for a total download 
of W. Here W is termed the repair bandwidth and it is of interest to minimize both W as well as the repair degree, 
defined as the number d of nodes accessed during repair. It is also desirable to have multiple options for both data 
collection and node repair in terms of the set of k or d nodes that one connects to. 

Distributed storage systems found in practice, include Windows Azure Storage [3] and the Hadoop-based sys- 
tems H| used in Facebook and Yahoo. In Facebook data centers, a [14, 10] maximum-distance separable (MDS) 
code is used in a coding scheme referred to as HDFS RAID [5]. Here data can be downloaded by connecting to 
any 10 nodes. The coding scheme is however, inefficient in terms of node repair, as the repair degree as well as 
repair bandwidth both equal 10. Regenerating codes [6] and codes with locality [7] are two alternative approaches 
proposed to address the situation. 

Two alternative approaches to coding have recently been advocated to enable more efficient node repair, namely, 
regenerating codes [6] and codes with locality |7). 



A. Regenerating Codes 

In the regenerating-code framework, there are n nodes in the network, with each node storing a code symbols 
drawn from a finite field ¥ q . A data collector should be able to download the data by connecting to any k nodes 
(see Fig. [T]). Node repair is required to be accomplished by connecting to any d nodes and downloading (3 < a 
symbols from each node. Thus the repair bandwidth is given by d/3. A regenerating code may be regarded as a 
vector code, i.e., a code of block length n over the vector alphabet The parameter set of a regenerating code 
will be listed in one of two forms: ((n, k, d), (a, /?), B) if the file size or number of message symbols B is known 
and relevant and ((n, k, d), (a, /?)) otherwise. 
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A part of work in Section III of this paper has appeared in an earlier arXiv submission, see (2). 
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Fig. 1, The Regenerating Code Framework. 

A cut-set bound based on network-coding concepts, tells us that given code parameters ((n, k, d), (a, 0), B) the 
size B of the data file is upper bounded [6 ] by 

fc-i 

B < ^2mm{a,(d-i)j3}. (1) 

i=0 

A regenerating code is considered as being optimal if 

1) the file size B satisfies ([T} 

2) the bound is violated if either a or /3 is reduced. 

Given the file size B as well as regenerating-code parameters (k,d), there are multiple pairs (a,/3) that satisfy 
([T]). This leads to the storage-repair-bandwidth trade-off shown in Fig. |2(a) The two extremal points in the trade-off 



are the Minimum Storage Regeneration (MSR) and Minimum Bandwidth Regeneration (MBR) points. At the MSR 
point, we have a = =jr = (d — k + l)(3 and at the MBR point, a = d(3. The remaining points on the trade-off curve 
will be referred to as interior points. A regenerating code is said to be exact if the replacement of a failed node 
stores the same data as did the failed node, and functional otherwise. 

1) Regenerating Code Constructions: It has been shown in [H that the interior points on the trade-off are 
not achievable using exact-repair regenerating codes. We summarize below the constructions known in literature 
for the MSR and MBR points. Except where otherwise noted, all results described below, pertain to exact-repair 
regenerating codes. 

a) MBR Point: There are two principal families of MBR codes: 

(i) The repair-by-transfer family discussed in Example [T] 

(ii) MBR codes constructed using the product matrix construction, see J9l. This construction can be used to 
generate MBR codes for any value of code parameters 

(n,k,d),(a = df3,/3 = l),B = dk 



2, 

b) MSR Point: At the MSR point, we have a = (d — k + l)(3. There are several families of MSR codes: 

(i) MSR codes constructed using the product- matrix construction, see (91. This construction can be used to 
generate MSR codes for any value of code parameters 

{(n,k,d> 2k - 2), (a, (3 = 1),B = ka). 

(ii) MSR codes with parameters 

((n,k,d = n- 1 > 2k - 1),(q,/3 = 1),B = ka), 
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(a) Storage-repair-bandwidth trade-off for fixed values of (b) Pictorial depiction of the repair-by- 

B = 7500, k = 10, d = 12. transfer MBR code in Eg [I] 

Fig. 2. Storage-Bandwidth Tradeoff and an example construction. 



described in iflOl and ifTTj . 

(iii) The Hadamard-design-based construction lfl2l of high-rate MSR codes with parameters 

((n, k = n - 2, d = n - 1), (a, /3 = 2 k ),B = ka). 

(iv) The Zigzag code construction lPT3l of high-rate MSR codes with parameters 

((n, k = n — m, d = n — 1), (a, ft = m k ~ 1 ), B = ka), 

that are guaranteed to only repair systematic nodes. 

(v) An explicit, functional-repair MSR code with parameters 

((n, k, d = k + 1), (a = 2, = 1), B = 2a), 

can be found in ifTOll . 

(vi) Apart from these explicit constructions, the existence of MSR codes for all (n,k,d), n > d> k, is shown 
in El. 

2) Other Work Related to Regenerating Codes: 

a) Fractional repetition codes, a framework studied in lfl31 . is related to the repair-by-transfer MBR code discussed 
above. Under this framework, node repair is required to be carried out without any computations, i.e, by mere 
transfer of data. The requirement on node repair is relaxed in the sense that, one needs to be able to recover 
from failure of a node by connecting to any one of several subsets of d nodes rather than by connecting to any 
d nodes. 

b) The framework of cooperative regenerating codes where multiple node repairs are carried out simultaneously 
and in a cooperative manner has been studied in |[T6l . A cut-set based bound is derived and two explicit class 
of constructions are presented there. 

Studies on implementation and performance evaluation of regenerating codes in distributed storage settings can 
be found in El, El, U3- 

An example construction of a regenerating code taken from (8), is given below. 



Example 1: In the example (see Fig. 2(b) I, the regenerating code has parameters ((n = 5, k = 3, d = 4), (a = 
4,(3 = l),B = 9). The collection of B = 9 message symbols are first encoded using a [10,9,2] MDS code of 
block length 10. Each code symbol is then placed on a distinct edge of a fully-connected graph with 5 nodes. The 
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code symbols stored in a node are the symbols associated to edges incident on the particular node. It follows that 
every pair of nodes share exactly one code symbol. A data collector connects to k = 3 nodes and thus has access 
to ak — (2) = 12 — 3 = 9 distinct code symbols of the MDS code and can hence decode the message symbols. 

Node repair is easily accomplished by the simple means of symbol transfer. Thus the replacement of a failed 
node simply receives from each of the neighbors of the failed node, the symbol the two nodes share in common. 
The code can be verified to achieve the upper bound in ([T]) corresponding to the MBR point, i.e., corresponding 
to a = d/3 and for this reason, these codes are referred to as repair-by-transfer MBR codes (RBT-MBR). The 
construction generalizes to any parameter set of the form ((n, k, d = n — 1), (a = n — 1, f3 = 1)) and the file size 
B is then given by 

B = *-(*), 

and can be shown to achieve the cut-set bound at the MBR point. 



B. Codes with Locality 

In Q , Gopalan et al introduced the interesting notion of locality of information. This was also in part, motivated 
by applications to distributed storage, where the aim was to design codes in such a way that the number of remaining 
nodes accessed to repair a failed node is much smaller than the block length of the code. The zth code-symbol q, 
1 < i < n, of an [n, k, d] linear code C over the field ¥ q is said to have locality r if this symbol can be recovered 
by accessing at most r other code symbols of code C. Equivalently, for any coordinate i, there exists a row in the 
parity-check matrix of the code of Hamming weight at most r + 1, whose support includes i. An (r, d) code was 
defined as a systematic linear code C having minimum distance d, where all k message symbols have locality r. It 
was shown that the minimum distance of an (r, d) code is upper bounded by 



d < n — k 



+ 2. 



A class of codes constructed earlier and known as pyramid codes [20] are shown to be (r, d) codes that are optimal 
with respect to this bound. The structure of an optimal code is deduced for the case when r\k and d < r + 3 and it 
is shown that the local codes must necessarily be MDS and support disjoint. The paper also introduces the notion 
of all-symbol locality in which all the code symbols, not just the message symbols have locality r. The existence 
of all-symbol locality was established for the case when (r + l)|n. 

1 ) Other Work on Codes with Locality: A class of codes with locality known as Homoniorphic Self-Repairing 
Codes, that makes use of linearized polynomials, was introduced in an earlier work by the authors of Ell . These 
codes have all-symbol locality and an example provided in 11211 turns out to be optimal with respect to the bound in 
Q. A general construction of explicit and optimal codes with all-symbol locality is provided in l22l that is based 
on Gabidullin maximum rank-distance codes. 

Locality in vector codes is considered in (23). The authors derive an upper bound on the minimum distance of a 
vector code under the assumption of all-symbol locality and also provide an explicit construction of a class of codes 
which achieve the bound for certain code-parameter sets. This construction is related to an earlier construction of 
codes with locality (see 112410 . involving the same authors. 

The notion of locality in scalar codes was subsequently extended by the authors of the present paper in 01, to the 
case when the local codes have minimum distance greater than 2. An analogous bound on minimum distance and 



code constructions are provided and these results are described in detail in Section III An earlier, parity-splitting 
construction appearing in E51 turns out to provide an example of such an extension of the notion of locality. A 
similar construction was subsequently presented in ll26l in the context of solid-state storage drives. These results 
will be revisited in Section [III] The results in ETll are described in Section |Tl]. 

Studies on implementation and performance evaluation of codes with locality in distributed storage settings can 
be found in [f3], (28]. In (3], a class of code termed as local reconstruction code and related to the pyramid code 
has been employed in a distributed storage code solution known as Windows Azure Storage, see Fig. [3] This 
code has block length 16 and by puncturing the code in two coordinates Pi, P2, one will obtain a code that is 
the concatenation of two support disjoint single-parity-check [7, 6, 2] MDS codes (corresponding to code symbols 



5 



labeled using X and Y respectively), which provide locality. The two global parity symbols Pi, P2 ensure that the 
minimum distance of the overall code equals 4. 
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Fig. 3. The pyramid code employed in Windows Azure Storage. 



In ll28ll . the authors discuss implementation of a class of codes with locality (called locally repairable codes) in 
Hadoop Distributed File System and compare the performance with Reed Solomon codes. 



C. Array Codes 

Regenerating codes are examples of vector codes, by which we mean codes over a vector alphabet, F™ for some 
integer m. In the case of regenerating codes, m = a. Any vector code may also be regarded as an array code in 
which each codeword corresponds to an array of size (mxn). A survey of array codes can be found in |29l . Array 
codes have found extensive application in storage systems and examples include the EVENODD code constructed 



in ||30l] and later extended in Oil as well as the Row-Diagonal Parity code presented in 1321 . Section IV 



Section Wof the paper provides a brief overview of the results of the present paper. A comparison of some coding 



options for distributed storage also appears here. The extended notion of scalar locality is discussed in Section III 



Section IV introduces vector codes and Section [V] discusses locality in the context of vector codes and provides 
bounds on minimum distance and code size assuming the local codes to be identical. Locality in the context of 
vector codes permits one to consider codes in which the local codes are regenerating codes. Optimal constructions 



of vector codes with locality, where the local codes are MSR and MBR codes are presented in Section VI and |VII 



respectively. In Section |VHI[ additional bounds on minimum distance are derived that take into account the particular 
structure of the code and which do not require the local codes to be identical. Most proofs are relegated to the 
Appendix. 



II. Overview of Results 

A. Results in Summary 

In terms of coding options for distributed storage, regenerating codes aim to minimize the download bandwidth 
during node repair, whereas, codes with locality seek to reduce the number of helper nodes contacted. This raises 
the question as to whether it is possible to design codes that combine the desirable features of both classes of codes, 
i.e., construct codes with locality, in which the local codes are regenerating codes. The present paper answers this 
in the affirmative. We term such codes as codes with local regeneration or equivalently, local regenerating codes. 
We develop bounds on the minimum distance of local regenerating codes as well as several constructions of codes 
that achieve these bounds with equality and are hence, optimal. 

In an independent and parallel worlfj] the authors of ETll also consider codes with all-symbol locality where 
the local codes are regenerating codes. Bounds on minimum distance are provided and a construction for optimal 
codes with MSR all-symbol locality based on rank-distance codes are presented. 

We now briefly state the various results contained in this paper. 

• Extension of Notion of Scalar Locality The paper begins by extending the notion of locality in scalar codes, 
where we allow the local codes to be more general codes, instead of just single parity check codes (and hence 
can have local minimum distance 5 > 2). An upper bound to the minimum distance is derived and the structure 
of a code that achieves this bound is derived for the case when the dimension of the local code divides the 
dimension of the overall or global code. It is shown that pyramid codes achieve the upper bound on minimum 
distance. The existence of optimal codes with all-symbol locality when the local code length divides the global 

'Both papers were presented at the Workshop on Trends in Coding Theory, Ascona, Oct. 29-Nov. 2, 2012. 
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TABLE I 

Bounds on Minimum Distance Appearing in the Paper 



Theorem 


Bound 


Comments 


Theorem 


3.1 


d min < + -l)(5-l) 


Bound for scalar codes with (r, 8) information locality 


Theorem 5.1 
URA BouncT 


d min < n-P^(K) + l 


Bound for vector codes with exact (r, S) information locality 
with URA local codes 


Theorem 
lo hour 


8.1 
id 


cU < n - |2o| + 1 - ( [J^ 1 ] - l) (S - 1) 


Bound for vector codes with (r, S) information locality 



code length is also shown. The explicit construction of codes with all-symbol locality contained in ||25ll called 
the parity-splitting construction is also presented and shown to be optimal for certain parameter sets. It is noted 
that concatenated codes are examples of codes with all-symbol locality and this is used to obtain a new upper 
bound on the minimum distance of a concatenated code. We note that most of the results on the extension of 
scalar locality have appeared in (H. 

• Vector Codes The discussion of codes with local regeneration necessitates a discussion of vector codes of 
which they are an example. As such, some basic observations about codes possessing a vector alphabet are 
made here and it is shown that exact-repair MBR and MSR regenerating codes naturally fall into a particular 
class of vector codes, which we term as uniform rank- accumulation (URA) codes. 

• Vector Codes with Locality This is followed by an extension of the notion of locality to vector codes and a 
bound on the minimum distance is derived for the case when the local codes have identical parameters and 
belong to the class of URA codes. The structure of the code is determined under additional assumptions. 
These assumptions hold for the case when (a) the local codes are MBR codes and (b) the local codes are MSR 
codes and the scalar dimension of the local code divides the scalar dimension of the global code. The scalar 
dimension of a code over the vector alphabet is its dimension as a vector space over ¥ q . 

• Codes with Local Regeneration We then provide several constructions for the class of codes with local 
regeneration, which are optimal with respect to the upper bound on the minimum distance. The constructions 
include both the cases the local codes belong to the MSR and the MBR family of regenerating codes. 

• Bounds on Minimum Distance of a General Vector Code with Locality Finally, we also provide additional 
bounds on minimum distance that take into account the particular structure of the vector code and which do 
not require the local codes to have identical parameters. 

A summary of the bounds on minimum distance derived in this paper is given in Table [I] An overview of various 
constructions (appearing in this paper) of codes with local regeneration is presented next. 



B. Overview of Constructions of Codes with Local Regeneration 

The constructions presented in this section are optimal with respect to the bound on minimum distance of a code 
with local URA codes as well as the bound on scalar dimension presented in Theorem 5.1 of Section [V] 
(a) Sum-Parity MSR-Local Code: The construction is illustrated in Fig. |4j The construction begins with a parent 
MSR code whose generator matrix is of the form [I | Pi | P 2 ] and which is moreover, such that the punctured 
code having generator matrix [I \ Pi] is also an MSR code. The codewords in the constructed local regenerating 
code are then of the form 

[m* I i^Pl I m£ | | (m + m 6 )*P 2 ], 

where m a ,rrifc are the message vectors associated with the two constituent, local regenerating codes. This 
construction turns out to yield optimal codes regardless of the number of constituent local codes, provided that 
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the global minimum distance d m i n does not exceed twice the local minimum distance 5. 
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Fig. 4. The Sum-Parity MSR-Local Code Construction. 

(b) Pyramid-like MSR-Local Code: This construction mimics the construction of pyramid codes, with the difference 
that we are now dealing with vector symbols in place of scalars, and local MSR codes in place of local MDS 
codes. If we puncture A thick columns, and the repair degree of the MSR code that we start out with is less 
than n — A, then the construction will result in an optimal MSR-Local code. 

(c) Repair-by '-Transfer MBR-Local Codes: In a repair-by-transfer MBR code, the vector MBR code may be regarded 
as being built on top of a scalar MDS code. A scalar pyramid code has constituent local codes which are scalar 
MDS codes. The scalar pyramid code also possess a certain number p of global parity symbols. The present 
construction begins with a scalar pyramid code in which there are I local MDS codes and where the number 
of global parity symbols p is a multiple of a, say p = Aa. The next step is the building of a separate repair- 
by-transfer MBR code on top of each of the I constituent local MDS codes. In the final step, A global-parity 
nodes are added, each containing a disjoint set of a scalar global parities of the scalar pyramid code. The 
construction is illustrated in Fig. [5] 
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Fig. 5. The Repair-by-Transfer MBR-Local code is shown on top. The code below is the underlying scalar pyramid code used to construct 
the MBR-Local code. 
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(d) Repair-by-Transfer MBR-Local Codes with All-Symbol Locality. The difference between this and the imme- 
diately previous Repair-by-Transfer Local-MBR code construction is that the scalar pyramid code employed 
in that construction is replaced here by a scalar all-symbol locality code. Thus the construction begins with 
a scalar all-symbol locality code in which there are £ local MDS codes. The next step is the building of a 
separate repair-by-transfer MBR code on top of each of the I constituent local MDS codes. The construction 
is illustrated in Fig. [6] 




Fig. 6. The Repair-by-Transfer MBR-Local Code with All-Symbol Locality. 

(e) We also show the existence, using counting arguments, of 

• MSR-Local codes with information locality and of 

• MSR-Local codes with all-symbol locality, 

whenever the field size q is sufficiently large. The existence results hold for a larger set of parameters than 

what we present using explicit constructions. 
A tabular summary of the various constructions contained in this paper is presented in Table [IT] This table 
summarizes the constructions of vector codes with locality whose local codes are regenerating codes. A performance 
comparison of the various classes of codes discussed so far is given in the next subsection. 

C. Performance Comparison 

We now provide a method for approximately comparing the performance of codes with local regeneration with 
those of regenerating codes and scalar codes with locality. The parameters against which comparison is made are 
as follows: 

1) the storage overhead Q which is the inverse of the code rate 

2) the normalized average bandwidth, £, needed to carry out node repair; the normalization is carried out both 
with respect to the amount of data stored as well as the code-length n since the number of node failures will 
typically be proportional to n, as is the case for example, under a Poisson model of node failures 

3) the repair degree, i.e., the number h of helper nodes that a failed node needs to access. 

We assume that all codes are designed to offer roughly the same level of reliability which we will translate to 
mean that codes having the same block length n must have the same value of minimum distance d m m- Note that 
the repair degree h is given by 
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TABLE II 

Summary of Constructions of Codes with Local Regeneration 



Construction 


Construction Type 


Locality Type 


Rate 
Optimality 


Field Size 


Restrictions on 
parameters 


Sum-Parity 
Constr. |6.l| 


Explicit 


MSR 
Information 


Optimal 


Field size of 
Underlying MSR Code 


dmin < 25 


Pyramid 
Constr. 


-Lik 
6.3 




Explicit 


MSR 
Information 


Optimal 


Field size of 
Underlying MSR Code 




Thm. 


6.5 




Existence 


MSR 
Information 


Optimal 


(") 

Vmr/ 


K = mm 


Thm. 


6.6 




Existence 


MSR 
All-Symbol 


Optimal 


(") 


n — m(r + 5 — 1), K = la 


RBT-b; 
Constr. 


tscc 
7.1 




Explicit 


MBR 
Information 


Optimal 


na 


Kl 1 Kf\ 


RBT-b; 
Constr. 


tscc 
7.3 




Existence 


MBR 
All-symbol 


Optimal 





K L | K and 
(r + 5- 1) | n 



"where Kl is the size of the local MBR code and K is the total file size 



• h = d in the case of a regenerating code 

• h < r in the case of a scalar local code 

• h = d in the case of a local regenerating code where d is in this case, the repair degree of the constituent 
local regenerating codes. 

In general, codes with locality offer a smaller value of repair degree for a given block length of the code. The 
challenge therefore, is to construct codes with locality, which compare favorably with regenerating codes in terms 
of the two other performance metrics, namely, storage overhead and repair bandwidth. 

To compare the storage overhead and repair bandwidth of the various code constructions, we proceed as follows. 
We assume that a user desires to store a file of size K across n nodes for a time period T with each node storing 
a symbols. A cost is associated with both node storage as well as for bandwidth consumed during node repair. 
We also assume a Poisson-process model of node failures for the whole system. Under this model, the number of 
failures in time T is proportional to the product of T and the number of nodes n (for large n). For simplicity, 
we only consider the case of single-node repairs in the plots, although a similar analysis can be carried out under 
the assumption of multiple node failures. The average cost of a single repair for a coding scheme is taken as the 
average amount of data download to repair a node which we denote by u). The cost of storage is assumed to be 
proportional to the amount of data stored, i.e., to na. 

With this, it follows that if j(K, T) denotes the average cost incurred to store a file of size K for a time period 
T using a particular coding scheme, then 

y(K,T) = (^kiio) + j s na)T (2) 

for some proportionality constants 7x ; 75- Hence the average cost incurred in storing one symbol for one unit of 
time is given by 

jiK, T) nQ na 
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We will refer to the quantity ^ as the normalized repair bandwidth £ of the code. Thus the average cost is a linear 
combination of the normalized repair bandwidth £ = ^ as well as the storage overhead Q = 

In Fig. [7J the performance of a representative set of codes with local regeneration (obtained via both explicit 
constructions and existential arguments) having common length n = 60 and common minimum distance d m i n = 8 
are plotted, for the case of a single node failure. Also included, are plot of the family of regenerating codes with 
parameters (n = 60, k = 53, d = 59). The repair degree is chosen as d = 59, since this results in the best possible 
normalized repair-bandwidth vs storage-over tradeoff for this class of codes. In the plots, the X-axis denotes the 
storage overhead Q. In the first plot, the Y-axis denotes the normalized repair bandwidth £, while in the second plot, 
the Y-axis denotes the average number of nodes accessed during repair. We see that codes with local regeneration, 
not only have better access, but also are comparable to regenerating codes in terms of storage overhead and repair 
bandwidth. Such plots could be drawn for the case of multiple node failures as well. 
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Fig. 7. The performance of various code constructions presented in this paper as well as that of regenerating codes, all having common 
length 60 and minimum distance 8 are plotted. Taken together, the two plots permit a comparison of the various codes in terms of normalized 
repair bandwidth, storage overhead and access (i.e., repair degree). 



III. Scalar Codes with Locality 

In this section, we extend the notion of locality in [7 ] to the case when the local codes are allowed to be more 
general codes than just single parity check codes. We derive an upper bound on the minimum distance of such 
codes and also deduce the structure of the global code when 

(a) the bound on minimum distance is achieved with equality and 

(b) the maximum possible dimension of a local code divides the dimension of the global code. 

We then discuss three code constructions, all of which are optimum with respect to the upper bound on minimum 
distance. Finally, we end by making a comparison with concatenated codes as concatenated codes may be regarded 
as special cases of scalar codes with locality. This viewpoint leads us to an upper bound on the minimum distance 
of concatenated codes that is often tighter than what is currently known. 

Let C denote an [n, k, d m i n ] linear code over ¥ q and let G denote a generator matrix of C. Also, let c = (ci, . . . , c n ) 
denote a codeword of C. The code C will also be referred to as a scalar code (considering elements of ¥ q as scalars). 
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Definition 1 ((r,5) code symbol locality): The i th code symbol, a, i £ [n], of C is said to have (r,S) locality, 
S > 2, if there exists a punctured code of C with support containing i, whose length is at most r + 5 — 1, and 
whose minimum distance is at least 5, i.e., there exists a subset Si C [n] such that 

• i £ Si, \Si\ < r + 5 — 1 and 

• dmin^lsj > 5, where C|s s denotes the code obtained when C is punctured to the set of co-ordinates 
corresponding to Si. 

It follows from the Singleton bound that dim(C|sJ < r. 

Definition 2 f(r, 5) information locality): The code C is said to have (r, 5) information locality if C has a set of 
punctured codes {Cj}ie£ with supports {Si}ie£> respectively, such that, for all i £ C, we have 

• 131 < r + S- 1, 

• d m in (Cj) > 6, and 

. Rank(G|u !ecSl ) = k. 

Here £ denotes the index set for the local codes and by G\s we denote the restriction of G to the set of columns 
indexed by the set S. 

If further VJ^cSi = [n], then the code is said to have (r, 6) all-symbol locality. 

The above definition for information locality is equivalent to saying that there exists a set of k independent 
columns of G, indexed by the X C [n], |Z| = k, such that all the k code symbols a, i £l have (r, 5) locality. The 
(r, d) codes introduced by Gopalan et al correspond to (r, S = 2) in the present notation. We also note that if C 
has (r, 8) information locality, then it must be true that d m \ n > 5. 



A. Upper Bound on Minimum Distance and Structure of Optimal Codes 

An upper bound on the minimum distance of codes with (r, 6) information locality, was established in Q for 
the case 6 = 2 and subsequently extended in [1J to the general case. The general result is presented in Theorem 3.1 
below. 



Theorem 3.1: Let C be an [n,k,d m - m ] scalar code with (r, 5) information locality. Then the minimum distance 
dmin of code C is upper bounded by 

k 



(L 



< n-k+1 



1 



(4) 



Proof: See Appendix [A] 



Any code achieving the bound in Theorem 3.1 with equality will be referred to as an optimal code having (r, 5) 
information locality (or all-symbol locality if C has all-symbol locality). 

We next deduce the structure of the code C for the case when r \ k and C is an optimal code having (r, 5) 
information locality. 

Theorem 3.2: Let r \ k, set - = t and let C be an optimal code having (r, 5) information locality. Also, as in 
Definition |2j let C denote the index set for all the local codes of C. Then 

(a) the local code Cj must be an [r + 5 — 1, r, 6] MDS code, V i G C. 

(b) the local codes must all have disjoint supports, i.e., Si n Sj = 4>, Vi, j 6 C, i / j, and 

(c) for any set of distinct indices i±, • ■ ■ , it G £ it must be that 

dim ^ t n fxxj j =°- ( 5 ) 

From this it follows that, up to permutation of columns, the (k x n) generator matrix G of C can be expressed 
in the form 

" Gi 



G 



G t 



A 



(6) 
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where Gi is the (r x r + 5 — 1) generator matrix of an [r + 5 — 1, r, <5] MDS code V 1 < £ < t, and ^4 is some 
((n — i(r + <5 — 1)) x n) matrix. 

Proof: See Appendix IB] ■ 



fi. Constructions of Optimal Codes with Locality 

Three constructions, all optimal with respect to the bound on d m i n in Q are discussed here. We begin by showing 
that the Pyramid codes of [20] are optimal with respect to (r, 8) information locality. We then study codes with 
(r, 5) all-symbol locality for the case when (r + 5 — 1) | n. For the case when the block length is of the form 
n = [-] (r + 5 — 1), we provide an explicit construction of a code with all-symbol locality by splitting the rows 
of the parity check matrix of an appropriate MDS code. We will refer to this as the parity-splitting construction. 
Finally, the existence of optimal codes with all-symbol locality is shown for the case when (r + S — 1) | n. 

1 ) Optimality of the Pyramid Code Construction: 

We will now show that under a suitable choice of parameters, the Pyramid code construction appearing in [20], 



achieves the bound in Theorem 3.1 with equality. For the sake of completeness, the construction is reproduced 
below. 

Consider an [n',k, d m i n ] systematic MDS code over ¥ q , where n' = k + <i m i n — 1, having generator matrix of 
the form 



G 



' I 


-2- 


(kxk) 


(fcx(d-l)) 



(7) 



The pyramid-code construction will then proceed to modify G to obtain the generator matrix of the desired optimal 
code. Let k = ar + (3, with < (3 < (r — 1). First the matrix Q is partitioned into submatrices as shown below: 



Q 



Qi 






Q' 


Qa 




Qa+1 





(8) 



where Qi, 1 < i < a are matrices of size (r x ( S — 1)), Q a +i is of size {j3 x (8 — 1)) and Q' is a (k x (ci m i n — 5)) 
matrix. Next, consider a second generator matrix G' obtained by splitting the first (5 — 1) columns of Q as shown 
below: 

Qi 



Ct 



Q' 



Qc 



Q 



a+l 



(9) 



Note that G' is a (k x n) full rank matrix, where 

n = k + d m - m - 1 + 



1 (.5-1). 



(10) 



Clearly, by comparing the matrices G and G , it follows that the code C generated by G', has minimum distance 



no smaller than d m ; n . Furthermore, C is a code with (r, 5) information locality. Hence, it follows from (lOi that C 
is an optimal code having (r, 5) information locality. 



Example 2: Let G be the generator matrix of a [7,4,4], systematic MDS code: 

1 511 912 913 

q _ 1 521 522 523 

1 531 532 533 

1 541 542 543 
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We construct a revised generator matrix G m - by splitting the first two parity columns and then rearranging columns: 



1 911 

1 921 


912 
922 




913 
923 




1 


931 
1 941 


932 
942 


933 
943 . 



The pyramid code C is then the code with generator matrix G vyr . It can be verified that for L\ = {1,2,3,4} and 
L2 = {5,6,7,8}, Gpyrl^j and G w \l 2 are both generator matrices of [4,2,3] MDS codes. Further it is easy to see 
that Rank(Gp yr |L 1 uL 2 ) = 4. It is also straightforward to show that the minimum distance of the pyramid code is 
no smaller than that of the parent [7, 4, 4] MDS code. It turns out that the pyramid code is optimal with respect to 
the bound in Q and hence has code parameters [9,4,4]. 



2) Optimality of a Parity-Splitting Code Construction: 

Theorem 3.3: Let n = [-] (r + 5 — 1). Then, for q > n, there exists an explicit and optimal [n, k, d mm ] linear 
code over ¥ q , having (r, 5) all-symbol locality . 

Proof: Let H' be the parity check matrix of an [n, k' , d] Reed-Solomon code over ¥ q , where k' = k + ([£] — 
1)(<5 — 1) and minimum distance, d = n — k' + l = n — k + 1 — ([-] — 1)(5 — 1). Such codes exist if q > n. We 



choose H',_ k ,-j Xn to be a Vandermonde matrix. Let 



H' 



Q(8-l)> 



A 



(d-S)xn 

We next partition the matrix Q into submatrices as shown below: 



Q 



Qi I Q2 



Q 



(ii) 



(12) 



in which the matrices {Qi, i = 1, . . . , [£] } are of uniform size ((5 — 1) x (r + 5 — 1)). Now, consider the code C 
whose parity check matrix, H, is obtained by splitting the first 5 — 1 rows of H' as follows: 



H 



Qi 



Qrk- 



(13) 



It is clear from the construction that code C has (r, 5) all-symbol locality. Let K denote the dimension of the code 
C. We will now show that C is an optimal [n, k, dmin] code, having (r, 5) all-symbol locality, by showing that 

• K = k and 

• the minimum distance <i m i n of C is given by the equality condition in ([4]). 
To see this, first of all note that the dimension of C 1 - is upper bounded as 



dim [C 



(<->) 

< 



(b) 



k 
r 
k 
r 

n — k 



(5-l) + d-5 
(5-1)+ (n-k 



1 (<5-l) + l 



(14) 



where (a) follows by counting the number of rows of H and (6) follows since d = n — k' + 1. Thus, from ( 14 1, 
we get that 



K > k. 



(15) 
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Next, we note by inspection of the matrices H, H' that any vector which is in the null-space of H is also in the 
null-space of H'. It follows that the minimum distance d m i n of the parity-splitting construction is at least that of 
the parent Reed-Solomon code having parity-check matrix H'. Thus 

k~ 

r 

But, since C has (r, 5) locality, from ([4]), we must also have that 

K~ 



> d 



n 



k + 1 



1 (5-1). 



(16) 



< n-K + 1 



From ( fT6| ) and ( [17] ), we get that 



(5-1) > K + 



K 

r 



1 (5-1). 



(5-1) 



(17) 



(18) 



which together with ( p~5] ) implies that K = k and also that C is optimal with respect to Q. 
3) Existence of Optimal (r,5) codes with All-Symbol Locality: 

Theorem 3.4: Let q > kn k and (r + 5 — 1) \ n . Then there exists an optimal [n, k, d n 
all-symbol locality code over ¥ q . 
Proof: See Appendix [C] 



code with (r, 5) 



C. An Upper Bound to the Minimum Distance of Concatenated Codes 

Consider a (serially) concatenated code (see IT331 . P4ll ) having an [m, fei, d{\ code A as the inner code and an 
[712,^2,(^2] code B as the outer code. Clearly, a concatenated code falls into the category of an code with (r, 5) 
all-symbol locality with 6 = d\, r = n\ — d\ + 1. Hence, the bound in Q applies to concatenated codes as well. 
Using the fact that a concatenated code has length n = ri\n2, dimension k = k\ki, we obtain from Theorem 3.1 
the following upper bound on minimum distance d mm : 

kik 2 



< n\TL2 - k\k 2 + 1 



ni — d\ + 1 

Well known bounds on the minimum distance of a concatenated codes are 

d\d 2 < d min < nid 2 . 



1 )(di-l). 



(19) 



(20) 



In practice, concatenated codes often employ an interleaver between the inner and outer codes in order to increase 
the minimum distance [33. In this case, while the upper bound in (20) no longer holds, the bound in ( (19] ) continues 
to hold, since the code continues to possess all-symbol locality even after interleaving. 



We observe that even when an interleaver is not used, ( |T9| ) is tighter than ( [20] ) if both the codes are MDS and 
the dimension of the first code k\ > 1. In this case k\ 



d u 



< 



nin 2 — kik 2 + 1 
m(k 2 + d 2 - 1) - 



k\k 2 



1 



< 



h 

k\k 2 + 1 - (k 2 
n\k 2 + n\d 2 -n\ — k\k 2 + 1 - (k 2 )(n\ 
n\d 2 — (k± — 1) 
nid 2 , 



d\ + l and k 2 

(di - 1) 
-l)(d 1 ) + k 2 - 



n 2 



■d 2 + 1. Then ( |T9| ) gives us that 

(21) 



1 



h + 1) + ni - ki + 1 + k 2 - 1 



(22) 



when k± > 1. 
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IV. Vector Codes 

Our aim here is to extend the notions of locality to include codes whose local codes are codes such as regenerating 
codes. Vector codes, by which we mean codes over a vector symbol alphabet, provide the appropriate setting for 
such an extension^! 

Definition 3: An Fq-linear vector code of block length n is a code C having a symbol alphabet F^ for some 
a > 1, i.e., 

C = {c = (ci,C2...,c n ), a alH G [n]} , (23) 

satisfying the additional property that given c, c' G C and a, b G ¥ q , 

ac + be' = (aci + bc' 1 ,ac2 + bc' 2 . . . , ac n + bc' n ) (24) 
also belongs to C, in which acj is simply the scalar multiplication of the vector Cj. 

We will refer to symbols from ¥ q , ¥ q as scalar and vector symbols respectively. The field ¥ g will be termed as 
the base field and the parameter a as the vector-size parameteiQ Associated with the vector code C is an F g -linear 
scalar code of length N = na, where is obtained by expanding each vector symbol within a codeword 
into a scalar symbols (in some prescribed order). Conversely, the scalar code also uniquely determines the 
vector code if one is given a-priori, the manner in which sets of a scalar code symbols are to be grouped together, 
to obtain the corresponding vector symbols. We will assume the canonical grouping, in which the first a scalar 
symbols form the first vector code symbol etc. We also use K to denote the dimension of the scalar code and 
often refer to it as the scalar dimension of the code C. 

Given a generator matrix G for the scalar code C^, the first code symbol in the vector code is naturally associated 
with the first a columns of G etc. We will refer to the collection of a columns of G associated with the i th code 
symbol Cj as the i th thick column. To avoid confusion, we will refer to the columns of G themselves as thin 
columns and hence there are a thin columns per thick column of the generator matrix. We will assume that the 
a thin columns comprising any thick column are linearly independent which is equivalent to saying that as the 
codewords run through the code C, the i th code symbol Cj, takes on all possible values from F" We will also 
use Wi to denote the (a-dimensional) subspace of F^- associated with the a thin columns making up the i th thick 
column. 

Given a subset IC [n], we use G\% to denote the restriction of G to the set of thick columns with indices lying 
in X. We will declare X to be an information set for C, if 

rank(G|x) = K (25) 



and if further, no proper subset of X possesses this property. The requirement in ( |25j ) is equivalent to stating that 

J2 W i = F f- ( 26 ) 

iex 

Since the subspaces {Wi, i = 1, . . . , n} can have non-trivial intersection, it follows that information sets can be of 
different cardinality. We use k to denote the minimum cardinality of an information set: 

K = min (27) 

information sets X of C 

and will refer to k as the quasi-dimension of the code C or q-dim(C). Any X such that \X\ = k will be referred to 
as a minimum cardinality information set. 

Remark 1: Scalar codes correspond to vector codes with a = 1. The quasi-dimension k of a scalar code equals 
its dimension k. 

The (Hamming) distance between any two codewords c and c' of C is the number of vector symbols in which 
c and c' differ. Since C is F^-linear, it follows that the minimum distance, d m ; n , of C is equal to the minimum 
Hamming weight of a non-zero codeword in C. 



As noted in Section [i] these codes can equivalently also be regarded as array codes. 

In the distributed storage context, a is also the node-size parameter as it denotes the number of symbols contained in a node. 
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We will refer to a vector code of block length n, scalar dimension K, minimum distance d m i n , vector-size 
parameter a and quasi-dimension k as an [n, if, d mm , a, k] code. This notation will be simplified to [n, K, d^n], 
whenever the vector-size parameter a and the quasi-dimension k is either clear from the context or else is not 
relevant to the discussion. If [N, K, D^] are the parameters of the scalar code C^ s \ it is easily verified that 



K 

a 



< k < 



N — .Dmin + 1 



a 



a 



We define the rate p(C) of an [n, K, d B 



Since, \^\ < k, it follows that 



, a, k] vector code C as the quantity 
K 

not 



RiC) 



K 

< . 

n 



(28) 



(29) 



(30) 



A. Singleton and Erasure Bounds 

The Singleton bound on the size q K of the vector code C yields 



which gives us: 



dn 



< 



a\Ti— d min +l 



(Singleton bound on code size), 



n 



K 

a 



+ 1 (Singleton bound on minimum distance). 



(31) 



(32) 



We will refer to codes achieving the Singleton bound pT) with equality as vector MDS codes. Several constructions 
of vector MDS codes are known in literature, for example see ll29l . QUI . 11311 . 1136*1 . (371 . 

A second bound arises from noting that, given any information set Z, the minimum distance is upper bounded 

by 



dmin < U - \I\ + 1. 



(33) 



This follows since the minimality inherent in our definition of an information set implies the existence of a non-zero 
codeword which is zero on (|X| — 1) symbols. In particular, since k is the smallest possible size of an information 
set, we have that 



< 



n 



K+l 



(erasure bound). 



(34) 



The converse implication of ( [33] ) is that n — d m i n + 1 is the largest possible size of an information set for the code. 
We will refer to ( [34] ) as the erasure bound for vector codes. Note that since k > [*— ] the erasure bound in (34) is 



in general tighter than the Singleton bound in ( |32| ). 

Equality in (3j_) holds only if K = no. and in this case d mni = n — — + 1, whereas, equality in ( [32] ) can hold 
even if a \ K. We will say that a vector code is systematic if through a sequence of elementary row operations and 



thick-column permutations, the generator matrix G can be reduced to the form G = [I Ka \ P f 



sax (n— K)ot\ • 



Without 



loss of generality, one can assume that the generator matrix G of a vector MDS code is in systematic form i.e., is 
of the form G = [I | P], where / is an identity matrix of size K and corresponds to k thick columns, while P is a 
K x (N — K) matrix. A characterization of a vector MDS code in terms of its generator matrix is presented next. 
The proof is analogous to the scalar case and hence omitted. 

Lemma 4.1: Any [n, K, d m ; n , a, k] vector code C is MDS if and only if the generator matrix can be represented 
in the form G = [I \ P], where the K x (N — K) matrix 



P 



G21 



^2,2 



G K ± G K - 



G2,n-K 
G^n—K 
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possesses the property that every square block submatrix of P is invertible. Here, the {Gij} are square sub-matrices 
of size a x a, and by a block submatrix, we mean a submatrix whose entries belong to the {Gij}. 

B. Puncturing and Shortening of a Vector Code 

Given any set S C [re], we use C\s to denote the restriction of the code to the set S and will refer to this code 
as the code C punctured to set S. Unlike in the scalar case, the quasi-dimension of a punctured code, C\s, can be 
either larger or smaller than q-dim(C). 

We define the shortened code C \ s as the code obtained by first restricting the attention to those codewords whose 
code symbols are zero on the complement 5 C of S and then deleting the coordinates associated to 5 C leaving 
behind a code of length \S\ F] The lemma below describes the effect of shortening a vector MDS code. The proof 
is identical to that of the scalar case and is omitted. 

Lemma 4.2: Given an [n,K,d m i n = n — k + l,a,n = — ] vector MDS code C, and a set S C [re] such that 
re— \S\ < k, the shortened vector code C\ s is also vector MDS with parameters [\S\, n'a, d m ; n , a, k' = n — (n — \S\)]. 

C. Regenerating Codes as Vector Codes 

Let C denote an ((re, k,d), (a, /?), B) regenerating code, as discussed in Section I-A The class of regenerating 
codes under consideration here will all be linear and will have the property that all the a scalar symbols contained 
within a node are linearly independent and hence these codes fall within the framework of vector codes considered 
here. Recall that the reconstruction property of a regenerating code says that the entire file can be recovered given 
the contents of any set of k nodes and hence it follows that the minimum distance d m in of C is lower bounded by 

^min > n — k + 1. (35) 

The lemmas below deal with the quasi-dimension of MSR and exact-repair MBR regenerating codes as well as the 
impact of puncturing and shortening these codes. 



Lemma 4.3: Any MSR code (either exact or functional repair) is vector MDS, i.e., achieves ( pi) with equality, 
and has quasi-dimension k = k. 

Proof: The scalar dimension (file size) of an MSR code is given by K = B = ka, which implies that the 
quasi-dimension k > k. On the other hand, from the data reconstruction property, one can recover all the data by 
connecting to any set of k nodes and hence k = k which implies in turn that K = Ka. This along with ([35]) implies 
that the code is vector MDS. ■ 

Remark 2: When we say that a functional-repair MSR code is vector MDS, we will mean that the code remains 
vector-MDS after every repair operation. 

Corollary 4.4: The generator matrix G of any MSR code can be represented in systematic form G = [Ik \ P]- 

Lemma 4.5: Any exact-repair MBR code is optimal with respect to the erasure bound and has quasi-dimension 

k = k. 

Proof: The fact that the quasi-dimension k = k follows from properties of exact-repair optimal regenerating 



codes discussed in [8]. The erasure bound optimality then follows from (35 1. 



Lemma 4.6: Suppose C is any ((n, k, d), (a,(3),B) regenerating code and if S C [re] is such that |5| > d, then 
the punctured code C\s is also a regenerating code with parameters (\S\,k,d, (a,/3),B). 

Lemma 4.7 (Theorem 6 of ftSl): Suppose C is an ((re, k, d), (a, /?)) MSR code and consider S C [re] such that 
7 = re — |5| < k. Then the shortened code C\ s is also an MSR code with parameters (re — 7, k — 7, d — 7, (a, /?)). 



4 The generator matrix of the shortened code may not have the property that all thin columns associated with a thick column are linearly 
independent. This issue, however, does not arise in the case of vector MDS codes. 



18 



D. Uniform Rank Accumulation Codes 

Definition 4: Let C be an [n, K, d m i Q , a, k] vector code having generator matrix G and let Si, 1 < % < n be an 
arbitrary subset of i thick columns of G. The code C is said to be a Uniform Rank Accumulation(URA) code, i.e., 
a code possessing the URA property, if the restriction G\s t of G to Si, has rank equal to 

i 

for some set {a\, a 2 , • • • , a n } of non-negative integers that are independent of the specific set Si of i thick columns 
chosen. 

We will refer to the sequence {dj, 1 < i < n} as the rank accumulation profile of the code C. Under the the 
definition of a vector code considered here, all the thin columns comprising a thick column in the generator matrix 
of a vector code are linearly independent and hence we have a\ = a, the vector-size parameter of C. It is also 
straightforward to show that 

a = a\ > a 2 > ■ ■ ■ a n _ 2 > a n -i > a n > 0, (36) 

and that 



Tl 



J>i = K. (37) 

i=\ 

Moreover, given that C has minimum distance <i m i n , it follows that the last d mm — 1 elements of the rank accumulation 
profile must equal 0, i.e., 

an-i = 0, 0<i < (d min -2), (38) 

and 

a„_ dmin+1 > 0. (39) 
Remark 3: Whenever C is a URA code, since any set of k thick columns of G form an information set for C, 



it follows that C is optimal with respect to the erasure bound in ( |34| ). 

Remark 4: Clearly, any vector MDS code is a URA code and it follows from Lemma 4.3 that this is also true 
of MSR codes. For both these classes of codes we have 

di = a, 1 < i < k , en = 0, k + 1 < i < n. (40) 

One can also show using the information accumulation profile for exact regenerating codes (see (H), that MBR 
codes are also URA codes. In the case of the MBR codes, the rank accumulation profile {aj} is given by 

a,i = a — £ + 1, 1 < i < k , Oj = 0, k + 1 < i < n. (41) 

V. Locality in vector codes 

In this section we define the notion of locality in the context of vector codes, in a manner analogous to that of 
scalar codes. We will specifically consider codes with locality where all the local codes are URA codes and obtain 
a upper bound on their minimum distance. Like in the scalar case, we will also deduce the structure of codes which 
achieve the bound with equality. The discussion of minimum distance bounds for codes with locality, where local 



codes are not necessarily URA codes is deferred until Section VIII 



In all of the definitions below, C is an [n, K, d m j n , a, k\ vector code possessing a (K x na) generator matrix G. 

Definition 5 ((r,5) locality): The i th vector code symbol, i £ [n], of C is said to have (r, 5) locality, 6 > 2, if 
there exists a punctured code of C with support containing i, whose length is at most r + 5—1, and whose minimum 
distance is at least 5, i.e., there exists a subset Si C [n] such that 

• i G Si, | Si | < r + S — 1 and 

• d min (C\ S J > S. 
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It follows from the erasure bound, given in ( [34] ), that q-dim(C| 



< r. 



Definition 6 ((r,5) information locality): The code C is said to have (r, 5) information locality if there exist a 
set of punctured codes {Ci}i & c of C with respective supports {Si}i & £ such that 

• \Si\<r + S-l, 

• d min (Ci) > 5, and 

. Rank(G| U!ecS J = K. 
Here C denotes the index set for the local codes. 

If further VJ^cSi = [n], then the code is said to have (r, 5) all-symbol locality. 

The case of locality in vector codes with 5 = 2 has been previously considered in [ 23 ] , where it was shown that 
under (r, 5 = 2) all-symbol locality, the minimum distance d m ; n of C is upper bounded by 



< 





~K~ 


+ -( 


' K~ 


"0 


n — 








a 




ra 





(42) 



From an implementation point of view, it is desirable that the local codes be identical and this prompts the 
definition of exact locality. We define the code C to have exact (r, 5) information locality if C has (r, 5) information 
locality such that \Si\ = r + 5 — 1 and d m \ n (Ci) = 5,Vi G C. In addition, if Ui^cSi = [n], then the code is said to 
have exact (r, 5) all-symbol locality. 

Let U (for Uniform rank accumulation) denote the class of F g -linear vector codes C, where each code C is an 
[n, K, d min , a, «] vector code 

• possessing exact (r, 5) information locality with 5 > 2, 

• whose associated local codes {Ci}ig£ are URA codes (described in Section IV-Di with rank accumulation 
profile {cii, i G [r + 5 — 1]}. 

Note that since URA codes are erasure optimal (see Remark [5]), q-dim(Cj) = r, \fi G C and hence a r+ i, . . . , a nL 
are all zeros. We use n^^K^ to denote the block length and scalar dimension of the local codes C» respectively, 
i.e., 



riL 



K L 



r + 5- 

i=l 



1, 



We will now present an upper bound on the minimum distance d m i n of the code C, whenever C Subsequently, 
under certain assumptions, we also identify necessary conditions of optimality with respect to the minimum distance 
bound. We begin by introducing some terminology that will be used to describe the bound. 



A. Sub-Additivity 

Let us extend the finite length vector (ai,a2,-- - , a nL ) to a periodic semi-infinite sequence {aj}^ of period 
til by defining 

a i+jnL = a, 1 < i < n L , j > 1. (43) 
Let P(-) denote the sequence of leading sums of this semi-infinite sequence, i.e., 

s 

P(s) = J^oi, s>l. (44) 

i=l 

It follows from the periodicity of {a,i}fl 1 thaj^] 

P(uin L + u Q ) = u\K L + P(u ), ui>0, 1 < u < n L . (45) 
5 It turns out to be more convenient to have the range of Uq as 1 < uq < til, as opposed to the more conventional range < u a < (nz, — 1). 
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With respect to the finite length vector (ai, 02 5 • • • , a nL ), let Q(-) represent the trailing-sum function given by 



Q{s) = a h l<s<n L . (46) 

i—riL— («— 1) 

We extend the definitions of P(-),Q(-) by setting P(0) = Q(0) = 0. It can be verified that 

1) For s in the range < s < til, P(s) > Q(s), 

2) P(-) is sub-additive, i.e., 

P(s + s') < P(s) + P(s'), for all s, s' > 0, (47) 

3) the sum P(s) + Q(s') satisfies 

P(s) + Q(s') < P(s + /), for all s > 0, < s' < n L . (48) 

We next define the function p( inv ) by setting p( im \v), for ^ > 1, to be the smallest integer s such that P(s) > za 
i.e., p( inv ^(^) = s, where s > is uniquely determined from P(s — 1) < v < P(s). It can be verified that 

P^ m) { Vl K L + v ) = vm L + P {im) (v ), Vl >0, l<v < K L . (49) 

As a special case, it follows that 

P^ m \ Vl K L ) = ( Vl -l)n L + r. (50) 

B. Upper Bound on Minimum Distance Under Exact Information Locality and Uniform Rank Accumulation 

Theorem 5.1: Let C belong to Class U. Then the minimum distance of C is upper bounded by 

dmin < n-P^ m \K) + l. (51) 
When Kl \ K, the bound takes on the form 

dmin < n- fJ-V + 1- (^--l)(5-l). (52) 



K L J \K L 

Corollary 5.2: Let C belong to Class U. Then given n, d m - m , the scalar dimension of C is upper bounded by 

K < P(n-d min + l). 

We say that C is distance-optimal if <i m i n = n — p( inv )(K) + 1 and rate-optimal if K = P(n - d min + 1). 
The following lemma, which is the analog of the Fact 1 of Q, for the case of vector codes, is used in the proof 
of Theorem 15.11 

Lemma 5.3: Given any set T C [n\ such that ranker) < K, we have 

dmin < n— | T | (53) 

with equality iff T C [n] is of largest size such that Rank(G|r) < K. 

Proof of Theorem 5. 1 • The proof proceeds along the lines of the proof of Theorem |3.1| We begin by applying 
Algorithm [I] below, to construct a large set T C [n] such that ranker) < K. The subspaces {Vi} appearing in 
the algorithm correspond to the column-space of G\s t , where Si denotes the support of the local codes d, i.e., 

Vi = J2 W *> 

where we recall Wi to be the span of the a thin columns comprising the £ th thick column of G. 

Let Algorithm [T] run to J iterations. Let Sj,Uj, 1 < j < J, denote the incremental rank and support size 
respectively, i.e., 

Sj = \Tj\ - | Tj-! |, (54) 
Vj = Rank(G|T 3 ) - Rank (G| Tj _J ■ (55) 
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Algorithm 1 Used in the Proof of Theorem 5.1 



Let T = { }, j = 
while 1 do 

Pick % e C such that V* ^ Z^teT w £ 
if Rank (G| TjU5i ) < K then 

J = j + 1 
Tj = Tj-i U Si 
else if Rank (G^us.) = ^ then 

Pick any maximal subset 5 en d of Si such that Rank (G^us^) < K 
z^nd = if - Rank {G\ Tj usJ) 
j = 3 + 1 

= Tj_i U Send 

Exit 
end if 
end while 



Note further that S en d is chosen in such a way there exists a choice of thick column in the last stage such that, 
adding this thick column to the support, will cause the accumulated rank to equal K. Let the integer a represents 
the amount of overlap in support between the final local code and the prior J — 1 local codes, i.e., 



a 



Tj-l Pi 5 en d 



(56) 



K 



Note that under the algorithm, it is possible that the final incremental support s j = 0. In addition, the sum a + s j 
is upper bounded by (r — 1). This last statement follows by first noting that the local codes are erasure optimal 
with q-dim(Cj) = r and hence as a result if a + sj = r, then this will result in rank K after the last step of the 
algorithm (but rank cannot reach K in the algorithm). The cumulative rank added (assuming 5 en d and one more 
thick column) is then upper bounded by 

J-i 

^ Vj + iyj + ^end) 

J=l 
J-l 

J2Q(sj) + (P(a + sj + 1) - P(a)) 

i=i 
J-i 

^Q( Sj .)+P( S j + 1) 
J=l 

J-2 

52Q(Sj) + (Q(3j-l) + P(8j + l)) 

J=l 

J-2 

^Q( Si )+P( S J_l + S J + l)) 

i=i 

j'=i 



< 



(a) 
< 



(6) 
< 



< 



where (a) and (6) respectively follow from ( [47] ) and ( |48| ). Hence it follows that 

J 

| Tj |+1 = + ! > P (inv) (if), 



(57) 



(58) 



(59) 



(60) 



(61) 



(62) 



(63) 



so that 



Tj\ > P {im) (K)-l, 



(64) 
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which from Lemma 15.31 leads to 



rfmin < n-P^\K) + l. 

When Kl \ K, this simplifies to 

d mi n < n-P^\K) + l 

(a) 



(65) 



where (a) follows from (|50l. 



n 



n 



n 



K 

~K~L 
K 



ljriL — r + l 
1 ] (r + 8-l)-r + l 



C. Structure of Optimal Codes 

Definition 7: We will say that the leading-sum function P(-) is strictly sub-additive in the range [til], if for any 
s > 1, s' > 1 such that s + s' < n L , we have P(s + s') < P(s) + P(s'). 

It can be easily verified that a necessary and sufficient condition for strict sub-additivity is that a\ > ct2- 

Theorem 5.4: Let C belong to class U and also assume that C is both distance and rate optimal, i.e., d m ; n = 
n — P mv (K) + 1 and K = P(n — d m [ n + 1). Also, let u\ = — 1. Following observations could be made with 
regards to the structure of the local codes: 

(a) If the leading-sum function P is strictly sub-additive in the range [til], then 

(i) the local codes {Cj}ig£ must all have disjoint supports, i.e., S% n Sj = 4>, Vi, j G C and 

(ii) for distinct i\, • • • i Ul , im+l G £> it must be that 

/ \ 



v ie n 



E^ 

=i 

u 

( 



0, V1<K ui, and 



\ 3 m ) 



Ui + l 

E 



Vl<Kai + l. 



(b) If the scalar dimension K is a multiple of K^, i.e. Kl\K, then once again 

(i) it must be that the local codes {Cj}j £ £ must all have disjoint supports. 

(ii) Furthermore, for distinct i\, 12, ■ ■ ■ i Ul , i Ul +l £ £, it must be that 



Vi. n 



Ul+l 

E 



\ 



V; 



0, V 1 <£< ui + 1. 



V j# / 



(66) 



(67) 



(68) 



Proof: We are given that 



2min = n-P inv (i^) + l 

if = P(n-d min + l). 



(69) 
(70) 
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( [64] >. As a result, we get that Y^j=i s j + 1 = P im (K) which implies that 



Referring back to the proof of Theorem 5.1 we then see that (|65j> is an equality and hence so must be (63 1 and 

J 




= P(P im (K)) 

= P(n - d min + 1) 

® K, (71) 



where (a) and (b) respectively follow from ( j69] ) and (70). This means that the chain of inequalities (58l-(62l have 



equality at every step. The chain is reproduced below for convenience of further analysis: 

j-i 

K = ^^j + ^J + ^nd) (72) 
3=1 

(0 ^ 

< J]Q( Sj ) + P(a + Sj + l)-P( ( 7) (73) 

3=1 
(«) ^ 

< ^Q( Sj ) + P(sj + l) (74) 
i=i 

J-2 

= + + + (75) 

5=1 

(Hi) 

< J]Q( Sj ) + P( S j_i + S j + l) (76) 

3=1 

iiv) (J \ 

< P lE s J + 1 J ( 77 ) 
= K. (78) 

Also let uq to be such that P im (K) = Ylj=i s j + 1 = u i n L + u o> where iti = — 1. Then note that, since 

P mv (P(J^ =1 + 1)) = X^/=i s i + 1' we § et tnat n o must be in the range 1 < no < r. We now analyze the 
conditions for various equalities in the above chain for the two cases: (a) P is strictly sub-additive in the range 
[til] and (b) K L \K. 

(a) Assume that P is strictly sub-additive in the range [til]- Equality in (u) coupled with the strict sub-additivity of 
P implies that a = 0, implying that the last code added was support disjoint from the rest. Towards analyzing 
the equality conditions in (m) and (iv), we first note that for s > 0, 5 < s' < til, the equality 

P(s) + Q(s') = P(s + s') 

can happen for a strictly sub-additive P iff either s' = hl or else, s + s' is a multiple of til- A little thought will 
now show that equality can hold in (m), (iv) iff either Sj = til, I < j < J — 1 or if there exists 1 < £ < J — 1 
such that 

Si + 1 + sj = n L 

*j = n L , 1 <j < (J -I), j±L 

In the latter case, this would imply that X)/=i s i + 1 is a multiple of n^, which we realize as a contradiction by 
noting from the above discussion that 1 < uq < r. Thus, we get that Sj = ul, 1 < j < J — 1 and s j + 1 = Uo. 
It then follows from this that J = u\ + 1 and the first J — 1 (= m) local codes are support disjoint. From 
our earlier observation, even the last local code was support disjoint, hence it follows that the J local codes 
encountered in the Algorithm [T] are support disjoint. Now, the fact that the local codes not encountered during 
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Algorithm [T] also have disjoint supports can be proved in a manner similar to the proof of Part (b) of Theorem 



We proceed next to prove the claims in ([66]) and ( |67] ). Towards this, first note that equality in (i) in the above 
chain, coupled with the previous observation that Sj ■ = til, 1 < j < J — 1 and sj + 1 = uq implies that 



VJ + Z^nd 



P(«o). 



^L, 1 < j < J-1, 



Thus if ii, «2j • • ■ S £ are such that Cj. was picked in the step, 1 < j < J = ui + 1, of Algorithm 

[T] then it follows that 



v ie n 



0, Vl<K«i, 



and that, 



Ml+l 



v l < t <«i + i. 



This is because we have dim(^"i_"J V^) = K > U\Kl and dim(Vi j .) = Kl, 1 < j < u\ + 1. The fact that 
above observations hold good for any set of ordered indices i\, 12, ■ ■ • , i Ul ,i Ul +i belonging to C can be proved 



in a manner similar to the proof of Part (c) of Theorem 3.2 



(b) We next consider the case when Kl\K and analyze the conditions for various equalities in the chain ([72|)-([78 1. 
First of all, note that since Kl\K, P im (K) = uiul + r (i.e., uq = r). Next, we note that since Uj < Kl, 1 < 
7 < J — 1, it follows that 

K 

(79) 



J> — = u 1 + l. 

We will next show that the number J of iterations in the algorithm equals u\ + 1. Towards this, consider the 
inequalities (Hi) and (iv) in the above chain. It can be shown that for any s = qini + qo, qi > 0, 1 < q$ < til 
and s' such that 5 < s' < hl, the equality 

P(s) + Q(s') = P(s + s') 

can happen only if s + s' > (qi + l)riL- It then follows that equalities in (Hi) and (iv) can happen only if 
Ylj=i s j + 1 = u i n L + r > (J — l)ni which gives us that J < u\ + 1. When coupled with ( [79] ), we obtain 
that J = u\ + 1 and hence 

J-i 

sj + (sj + 1) = ( J - l)n L + r. 

i=i 

Further, noting that 1 < Sj < n^, 1 < j < J — 1 and 1 < sj + 1 < r, it follows that 



nL, if 1 < i < J — 1 , 
r — 1, if j = J. 



(80) 



Also, recall that <t+sj < r — 1, and thus we get that a = 0. Hence it follows that the J local codes encountered 
in the Algorithm [I] are support disjoint. Moreover, equality in (i) in the chain, along with the above observation 
regarding disjointness also implies that 



VJ + I^end 



Q(n L ) = K L ; l<j< J-1, 
P(r) = K L . 
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We note that this implies that if ii,i2,--- ,i Ul ,i Ul +i 6 £- are such that Cj wa $ picked in the j step of 



We show in this and the next section, how it is possible to construct vector codes with locality, such that the 
constituent local codes are regenerating codes, thereby simplifying node repair in two respects. Node repair can be 
carried out on average, by accessing a small number of nodes while downloading an amount of data that is not 
much more than what the data node stores. The present section will focus on the construction of optimal codes 
with information locality in which the local codes are MSR codes. A more formal definition appears below. 

Definition 8: Let C be an [n, K, d m i n , a] vector code over ¥ q possessing (r, 5) information locality. Let G be the 
generator matrix for the code. Then C is said to be an MSR-local code with (r, 5) information locality, provided 

• the code C can be punctured so as to yield m local codes C{ in which the i th local code is an (n^, r, d)-MSR 
code with = (r + 5 — 1), 

• and if the i th local code has support Si, and S = U 1 ^ 1 Si, then 



If in addition, S = [n], we will say that C is an MSR-local code with (r, 5) all-symbol locality. 

For convenience, we will simply write MSR-local code in place of MSR-local code with (r, 5) information 
locality and all-symbol MSR-local code in place of MSR-local code with (r, 5) all-symbol locality. 

Four constructions of MSR-local codes are presented in this section of which the first two are explicit. The 
third construction will prove the existence, over large enough fields, of MSR-local codes for a wider range of code 
parameters than is possible under the two explicit constructions. The fourth construction will establish the existence 
of all-symbol MSR-local codes whenever hl \ n. Throughout this section we will assume that 5 > 3 as it turns 
out that 5 = 2 result in codes where the local codes have trivial regeneration (j3 = a). 

A. MSR Codes and Uniform Rank Accumulation 

Let B be an ((n^, r, d), (a, /3), Kl) MSR code. It can be seen either from the rank accumulation profile of 
MSR codes presented in O or from the fact that MSR codes are vector MDS codes that B has uniform rank 
accumulation. The rank accumulation profile {cij} is given by 



Algorithm [TJ then it follows that 



V ie n Y^Vi, = 0, V 1 < £ < m + 1, 

3=1 

\ m I 



The rest of the proof follows along the same lines as the proof of Part (a). 



VI. MSR-Local Codes 



Rank(G| 5 ) = K. 




0, r + 1 < i < riL. 



It follows that 




i=l 



Next, let 



K = v\K L + vq 



= v±(ra) +v , 1 < v o < K L . 
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Then we have that 



pd™)(K) 



Vl n L + P^(v ) 

V\n L + — 
a 

v\{5 - 1) + vir + 
K 

v 1 (6-l) + 
K 



vo 
a 



a 



ra 



1 (5-1) + 



K 

a 



It follows that for codes with exact (r, 5)-MSR Locality, we have that 



dram < Tl + 1 - 


-P( 










' K 


-)"( 


" K ' 


-0 


= — 


a 


ra 



1 (6-1). 



(81) 



Remark 5: Assuming that it is possible to construct codes that satisfy the bound on d m i n in ( |8T| ) for any given 
value of K, we see that the largest scalar dimension for a given value of d m i Q results when a divides K. All 
MSR-local codes presented in this section achieve the bound on d mm of ( |8~Tj ) and have a \ K and hence are rate 
optimal. 



B. Sum-Parity MSR-Local Codes 

Construction 6.1: Let Cq be an ((n^ + A, r, d), (a, /?)) MSR code with = (r + 5— 1) such that d < r + 5 — 2. 
Let Gq = [Gl I Qa] be a generator matrix of Cq, where Gl and Qa are matrices of size (ra x n^a) and (ra x Act) 



respectively. By Lemma 4.6 we know that the matrix Gl generates an ((ni,r,d), (a,j3)) MSR code obtained by 
puncturing Co in the symbols associated with the matrix Qa- Next consider the code C with generator matrix G 
given by 

Gl Qa 



G 



(82) 



Gl Qa 

in which both matrices Gl and Qa appear m > 1 times. 

The theorem below identifies the parameters of the code so constructed and proves the construction to yield 
MSR-local codes with minimum distance d m j n achieving the bound given in (|8T|) whenever 5 > A. 



Theorem 6.2: Consider the code C constructed in Construction |6. 1 1 in which the parameters 5, A are chosen such 
that S > A. Then the code C is an MSR-local code with (r, 5) information locality, and has 

(a) length n = mriL + A and vector-size parameter a, 

(b) m support-disjoint local codes each of which is MSR with parameters ((til, r, d), (a, f3)) and possessing 
generator matrix Gl, 

(c) scalar dimension K = mra and 

(d) minimum distance c? m j n satisfying the K-bound (|8T|), 



dm 



K 



n 



+ 1 



a 

6 + A. 



(- 

V ra 



1 (8-1) 



(83) 
(84) 



Proof: (a),(b),(c) are evident from the construction. To prove (d), we will first show the equivalence between 
the two expressions for d m i n provided. Since K = mra and n = m(r + 5 — 1) + A, we have that 



K 



n 



+ 1 



a 



K 

ra 



1 (<5-l) 



n — mr + 1 
A + 5. 



(m -!)(*-!), 



(85) 
(86) 
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Thus, it suffices to show that any non-zero codeword c has Hamming weight, wt(c) > 5 + A. First of all, note that 
if c has non-zero components belonging to two or more local codes, then clearly wt(c) > 25 > 5 + A, since all 
local codes themselves have minimum distance 5 and 5 > A by hypothesis. Next, consider the complementary case 
where the non-zero components of c are restricted to one of the local codes and the global parities. By inspecting 
the generator matrix G given in (82), it can be seen that when the all-zero code symbols corresponding to remaining 
(m — 1) local codes are deleted from each codeword, the resultant punctured codeword lies in the row-space of 
Go = [Gl\ Qa]- The proof now follows by noting that Go generates an MSR code of minimum distance 5 + A. 



C. Pyramid-Like MSR-Local Codes 

The construction below mimics the construction of pyramid codes in GUI , with the difference that we are now 
dealing with vector symbols in place of scalars and local MSR codes in place of local MDS codes. 

Construction 6.3: Let C be an ((n' = mr + 5 — 1 + A, k' = mr, d), (a, /?)) exact repair MSR code such that 
d < n' — A — 1 = mr + 5 — 2. Let the (systematic) generator matrix G' of C be given by 



G — [ Imra Q | Q 



(87) 



where I mra denotes an identity m atrix of size mra and the matrices Q, Q' are respectively of size (mra x (6 — 1) a) 
and (mra x Act). From Lemma 



4.6 



it follows that the "punctured" generator matrix G" = [Imra \ Q] generates 
an ((n' — A, k', d), (a, /?)) MSR code; let us call it C". Let the matrix G" be represented in block-matrix form as 
shown below: 



G — [Imra | Q] 



Ir 



Ql 



Ira Qr 



(88) 



where Qi, 1 < i < m are matrices of size (ra x (5 — l)a). Then the generator matrix G of the desired code C is 
obtained by splitting and rearranging the columns of Q, as shown below 



G 



Ir 



Qi 



Q' 



Qr, 



(89) 



Clearly, the code C has (r,5) information locality, where the local codes are generated by [I ra \ Q t ], i € [m]. It 
can also be observed that all the local codes are shortened codes of C" and from Lemma |4.7[ it follows that these 
are all MSR. Thus, we conclude that the code C is an MSR-local code. 

The theorem below identifies the parameters of the code so constructed, and proves optimality with respect to 
minimum distance. 



Theorem 6.4: Construction |6.3| gives us an MSR-local code with (r, 6) information locality, and parameters 

(a) K = mra, n = m(r + 6 — 1) + A, a = (d — r + 1)(3, 

(b) d min satisfying 



K 



n 



+ 1 



a 



K 

ra 



1 (5-1). 



Thus the code is optimal with respect to the K-bound in ( 8 1 1 on minimum distance. 



Proof: As in the proof of Theorem 6.2 it suffices to prove that 



K 



n 



+ 1 



a 

6 + A. 



K 

ra 



1 (5-1) 



(90) 
(91) 



However, by inspecting the generator matrices G and G' , it is clear that the minimum distance of C is no less than 
that of C. The theorem now follows by noting that C is an MSR code with minimum distance d m \ n (C) = 6 + A. 
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Remark 6: The existence of MSR codes for all possible [n, k, d] has been shown in 1 14] and these codes could be 
used as the codes Cq,C in the two constructions above. In terms of known, explicit constructions, the code Co can 
be picked from the product-matrix class [9] of MSR codes. The product-matrix construction requires d > 2r — 2, 
which combined with d < r + 5 — 2, leads to the constraint r < 5 on the applicability of this construction in 
Theorem |6.2| When combined with the requirement d < mr + 5 — 2, it leads to the constraint mr < 5 on the 



applicability of this construction in Theorem 6.4 



D. Existence of MSR-Local Codes when K = mra 



Theorem 6.5: Given the existence of an ((n.L, r, d), (a, /?)) exact-repair MSR code with ul = (r + 5 — 1), there 
exists an MSR-local code with (r, 5) information locality over ¥ q , and d m - m achieving the If -bound, i.e., 

K 



n 



+ 1 \ 

a \ra 



^-i]0-u 



(92) 



with K = mra, for some integer m > 2, whenever q > ( JM . 
Proof: See Appendix [D] 



We note that unlike in Theorems 6.2 and 6.4 there is no constraint here on the repair degree d involving r and 



5, and thus Theorem 6.5 is applicable for a wider range of parameters than are Theorems 6.2 and 6.4 



E. Existence of MSR-Local Codes with All-symbol Locality 

Theorem 6.6: Given the existence of an ((n^, r, d), (a, (3)) exact-repair MSR code, where til = r + 5 — 1 there 
exists an [n, K, d mm , a] MSR-local code C with (r, 5) all-symbol locality over ¥ q , such that d m - m achieves the 
if-bound with equality 

K i 

d m in = n h 1 ~ 

a 

whenever 

• K = ia for some positive integer £ > r, 

• n = miiL for some positive integer m > ^ and 

• field size q > (") . 

Proof: See Appendix [E] ■ 



K 



ra 



1 



(93) 



VII. MBR-LOCAL CODES 

The present section will focus on the construction of optimal codes with locality in which the local codes are 
MBR codes. More formally, we have 

Definition 9: Let C be an [n,K,d m i n ,a] vector code over ¥ q possessing exact (r, S) information locality. Let 
G be the generator matrix for the code. Then C is said to be an MBR-local code with (r, 5) information locality, 
provided 

• the code C can be punctured so as to yield m local codes C« in which the i th local code is an (n^, r, <i)-MBR 
code with ni = (r + 5 — 1), 

• and that if the i th local code has support Si, and S = U^^^Si, then 

Rank(G| 5 ) = K. 

If in addition, S = [n], we will say that C is an MBR-local code with (r, 5) all-symbol locality. 

As with MSR-local codes, we will write MBR-local code in place of MBR-local code with (r, 6) information 
locality and all-symbol MBR-local code in place of MBR-local code with (r, 5) all-symbol locality. 

Two constructions of optimal MBR codes will be presented in this section. The first is an explicit construction 
of an MBR-local code that can be applied whenever Ki \ K, where Kl and K denote the scalar dimension of the 
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local and global codes respectively. In the second construction, we show the existence of all-symbol MBR-local 
codes whenever Kl \ K and in addition, m \ n. In both cases, optimality is with respect to the bound 

dmin < n ~Y L r + l ~ (94) 



appearing in Theorem 5.1 The MBR codes appearing in both these constructions are the repair-by-transfer MBR 



codes presented in (SI and described in Example [T] of the present paper. 

A. MBR-Local Codes with (r, 5) Information Locality 

Construction 7.1: The aim of this construction is to build an optimal MBR-local code C with (r, 5) informa- 
tion locality composed of m support-disjoint MBR codes each of which is a repair-by-transfer (RBT) 
((riL,r,d), (a, (3),Kl) MBR code along with A global parity symbols. The parameters of each RBT MBR code 
satisfy 



til = r + 5 — 1, a = d = til — 1, (3 = 1, Kl = ra 



Thus the desired global code C will have length n = mriL + A and scalar rank K = ttiKl- The construction will 
proceed in three stages: 

Stage 1: Let us define Nl = P^) and set A^ = Nl — Kl + 1. Then in the first stage, a pyramid code A (see 



Section III ► with {Kl, Al) -information locality is constructed that is composed of m support-disjoint local codes 
and Aa global parities. In other words, each of the disjoint local codes Ai in the pyramid code is an 
MDS code with parameters [Nl, Kl, Al] and the overall code A has parameters [mNi + Aa, mK^, Al + Aa]. 

Stage 2: In the second stage, the Nl symbols that correspond to the i th local code Ai are regarded as MDS- 
coded symbols with parameters [Nl, Kl, Al] and used to construct a repair-by-transfer {{til, r, d), (a, /3), Kl) 
MBR code. 

Stage 3: In the final stage, the Aa global parities are collected into A groups of a symbols each with each 
group representing the contents of one of the A global parity nodes. 
This completes the construction. 

An example of Construction |7.1| is illustrated in Figure [5] 

Theorem 7.2: Construction |7. 1| results in an MBR-local code C with (r, 5) information locality composed of m 
support-disjoint local codes {C«}™ 1 each of which is a repair-by-transfer (RBT) ((n^, r, d), (a, /?), Kl) MBR code 
along with A global parity symbols. The parameters of each RBT MBR code satisfy 



riL = r + 5 — 1, a = d = til — I, (3 = 1, Kl = ra 



Thus code C has length n = mriL + A, scalar dimension K = ttlKl and d m i n satisfying the upper bound in (|94J» 
given by 

K ( K \ 

d min = n - — r + 1 - — - 1 )(S - 1) (95) 



K L \K l 

= A + 8. (96) 
Proof: All claims in the theorem are clear with the exception of the claim concerning the minimum distance. 



Since Kl | K, an upper bound on <i m j n from (94) is given by 



dmin < n - + 1 - (~ - 1 ) (5 - 1) (97) 



K L \K l 

{n-mr + 1) - (m-l)(5 -I), (98) 

n - m{r + 5 - 1) + 5 (99) 

S + A. (100) 
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To show that the code satisfies the above bound with equality, it suffices to show that any pattern of 5 + A — 1 



erasures can be corrected by the code. Towards this, we note that the scalar code A employed in Construction 7.1 
has minimum distance given by 

Anin = A L + Aa = Aa + ~ M + 1. 

Given any pattern of 5 + A — 1 erasures in the vector code C, by using the structure of the repair-by-transfer MBR 
local codes, we will evaluate the number of scalar symbols of the pyramid code, A, that are erased and show that 
this number is at most -D m ; n — 1. This would imply that the pyramid code, A, can recover from this many erasures 
and thus, so can the vector code C. 

As a first step we note that at least 5 — 1 vector code symbols out of any pattern of 5 + A — 1 erased vector 
code symbols, come from the union of the local codes. We will now argue that the maximum number of scalar 
symbols lost on erasing the first 5 — 1 vector code symbols from the union of the local codes is ( <5 2 1 )- 

Assume that a given pattern of 5 + A — 1 vector code symbol erasures is given. For this pattern, we further 
restrict ourselves to the first 5 — 1 vector code symbols erased from the union of the local codes. Let 7$, 1 < % < m 
be the number of code- word symbols erased from the i th local code. Note that < < 5 — 1 < hl, 1 < i < m 
an d Y^ILi 7i = ^ — 1- Next, we observe that the number of scalar symbols lost when 5 — 1 nodes are lost from the 
union of local codes equals 



The loss of A more vector code symbols can cause the loss of at-most Aa more scalar code symbols. Thus we 
have that the maximum number L of scalar code symbols lost as a result of A + 5 — 1 erasures, is given by 

L < Aa + ( 5 ~ 1 ) = A L - 1 + Aa = Anin - 1, 



2 



and the result follows. 



B. Existence of Optimal MBR-Local Codes with (r, 5) All-Symbol Locality 

Construction 7.3: The aim of this construction is to build an optimal MBR-local code C with (r, <5)-ail-symbol lo- 
cality composed of m support-disjoint MBR codes each of which is a repair-by-transfer (RBT) ((ul, r, d), (a, (3),Kl) 
MBR code, using optimal scalar all-symbol locality codes, whose existence has been shown in Theorem |3.4| The 
parameters of each RBT MBR code satisfy 



riL = r + 5 — 1, a = d = riL — 1, (3 = 1, Kl = ra 



The desired global code C will have length n = mni and scalar dimension K that is assumed to be a multiple 
K = £Kl for some positive integer I < m. The construction will proceed in two stages: 

Stage 1: Let us define Nr, = ( n 2 L ) and set A^ = Nr, — Kl + 1. Then in the first stage, a scalar code A with 
(Kl, A/,) all-symbol locality, of length ijiNl, dimension K = £Kl and which moreover, is optimal with respect 
to the bound on minimum distance is assumed to be given. The existence of such a code is shown in Theorem 



3.4 As Kl \ K, it also follows from Theorem 3.2 that A is composed of m support-disjoint MDS codes local 
with parameters [Nl, Kl, A^]. The global scalar code A has parameters [uiNl^Kl, A^ + (m — €)Nl\- 

Stage 2: In the second stage, the Nl symbols that correspond to the i th local code Ai are regarded as MDS- 
coded symbols with parameters [Nl, Kl, Al] an d used to construct a repair-by-transfer ((ul, r, d), (a, /3), Kl) 
MBR code. 

This completes the construction. 



An example of Construction |7.3| is illustrated in Figure [6] 
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Theorem 7.4: Construction |7.3| results in an MBR-local code C with (r, 5) all-symbol locality composed of m 
support-disjoint local codes {Cj}™ 1 each of which is a repair-by-transfer (RBT) ((riL,r, d), (a, /?), K£) MBR code 
along with A global parity symbols. The parameters of each RBT MBR code satisfy 



til = r + 5 — 1, a = d = til — 1, (3 = 1, Ki = ra- 
Thus code C has n = mriL and scalar rank K = £Kl for some positive integer I < t and d n 



bound in (94 ) given by 



K 



n 



-r + 1 



K L 

(m — l)ni + 5. 



K 
~K~l 



1 (5-1) 



satisfying the upper 

(101) 
(102) 



Proof: See Appendix [F] 



Remark 7: It follows from conditions derived in Theorem 5.4 of Section V-C| that both constructions presented 
in this section are rate optimal. We also note that the rank accumulation profile is strictly sub-additive for MBR 
codes and hence it is not possible to construct any d m - m optimal MBR-local codes without support disjoint local 
codes. 



VIII. Bound on d min Based on Quasi-Dimension 

In this section, we derive bounds on minimum distance for vector codes possessing locality. Unlike in Section 
[Vj we do not assume exact locality in this section. 

The case for locality in vector codes with 5 = 2 has been previously considered in 11231 . where it was shown 
that dmin, under (r, 5 = 2)-all-symbol locality, is upper bounded by 



dn 



< 





~K~ 


+ -( 


~ K~ 


- 1 ) 


n — 








a 




ra 





(103) 



A. Bound on Minimum Distance for Vector Codes with Locality 

We obtain below an upper bound on the minimum distance of a vector code, in the presence of (r, S) information 
locality, that holds for all 5 > 2 and which when specialized to the case 5 = 2, is in general, tighter than the bound 
in ( fTOll ). 

Theorem 8.1: Consider an [n, K, d m ; n , a, k] vector code C with (r, 5) information locality. Let the local punctured 
codes of length at-most til = r + 5 — 1 and minimum distance at-least 5 be {Ci,i 6 £} and their supports be 
Si,i G C respectively. Set S = U^S^. Then, the minimum distance of C is upper bounded by 



d n 



< n — \Iq I ■+ 

< n — k + 1 



< n 



a 



+ 1 



( 




Zoll 
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' K 






I- 1 ) 


r 




-1 
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" K~ 






ra 



1 (5-1) 



1 (5-1) 



(Xo-bound) 
(ft-bound) 
(if -bound) 



(104) 
(105) 
(106) 



where Xq is a minimum cardinality information set for C\s or equivalently, equals the quasi-dimension of C\$, i.e., 
\1 \ = q-dim(C|s). 

Proof: See Appendix |G] 

■ 

Remark 8: For the same set of code parameters [n, K, d m ; n , a] , it is possible to construct codes having different 
values of k and Iq. Thus k and Zq depend upon finer structural details of both global and local code. Thus while the 
K-bound is a global bound on the minimum distance, the k and Xo-bounds may be regarded as structure-dependent 
bounds. 
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The MSR-local code constructions presented in Section IVI] of this paper achieve the iT-bound given in (|106|) 



with equality. On the other hand, the MBR-local code constructions in Section VII are examples of codes that do 
not meet the K-bound but which achieve the Zo-bound. It also turns out that amongst the codes constructed using 



Construction 8.4 there are examples of codes that achieve the k bound but which do not achieve the i^-bound. 
We now discuss necessary conditions for achieving equality in the Zo-bound, whenever r divides |Zq| where Zq 



is as defined in the statement of Theorem 8.1 and is a reference to a minimum cardinality information set for the 



restriction of the code C to the union of the support of the local codes. 

Theorem 8.2: Consider an [n, K, d m [ n , a, k] vector code C having (r, 5) information locality that is optimal with 

Ix l 

respect to the Zo-bound. We assume further that r | |Zd| and set = t. Let {Cj}j g £, be the set of all local codes 
whose length is at most r + 5 — 1 and distance is at least 5 and let {Si}i e c respectively be their supports. Then 

(a) Ci is an [r + 6 — 1, dim(Cj) < ra, 5, a, r] erasure optimal code V i G £, and 

(b) for distinct i%,i2 £ £, the codes C% x and C{ 2 are support disjoint, i.e., 



(107) 



Proof: The proof is along the same lines as the proof of Theorem 3.2 for the scalar case and hence omitted 



B. Optimal Vector Codes with Locality through Stacking 

By stacking a scalar codes with locality, one trivially obtains a vector code with locality. More specifically, let B 
be a scalar local code having parameters [n, k, d mm ] and let C be the code obtained by stacking a codewords, each 
drawn from £>, to obtain a codeword from C. It is straightforward to verify that C has code parameters [n, K, d m [ n , k] 
where K = ka and k = k. By numerically comparing the bounds Q and ( |106| ) on minimum distance in the scalar 



and vector case respectively, it follows that the vector code is optimal with information or all-symbol locality 
depending whenever the scalar code is optimal in the same sense. This observation is made formal in the following 
theorem. 

Theorem 8.3: For any set of parameters n, k, a, r,6, 5 > 2, the following optimal vector codes can be constructed 
via stacking: 

1) an explicit (r,6) information locality code, 

2) an explicit (r, 5) all-symbol locality code, whenever n = \^~\ (r + 5 — 1), 

3) a non-explicit (r, S) all-symbol locality code, whenever (r + 5 — l)|n and the field size q > nn K '. 
The minimum distance of all the three classes of codes is given by the equality in the i<C-bound. 

Proof: Each of the three classes of the codes are respectively obtained by stacking a independent codewords 
of the following classes of optimal scalar codes with locality: 

1) pyramid codes which are explicit (r, 5) information locality codes, 

2) parity splitting codes which are explicit (r, 5) all-symbol locality codes, whenever n = \^~\ (r + 5 — 1), 

3) (r, 5) all-symbol locality codes whose existence is known, whenever (r + 6 — 1) \ n and the field size q > kn k . 



C. A Class of Optimal and Explicit (r, 5) All-Symbol Locality Vector Codes 

An explicit construction for obtaining optimal codes with (r, 5) all-symbol locality, for the case of 5 = 2 was 
presented in |[23l . This construction has a straightforward extension for any arbitrary 5 > 2. The construction as 
well as its extension are described below. 

Construction 8.4: Pick a message matrix M of size r x k\ k' > 0, such that (r + l)|n. The encoding takes place 
in two stages: in the first stage, the message matrix is encoded by a product code, wherein the row code is chosen 
as an [n, k'] MDS code and the column code is a parity-check code. Let c' denote the ((r + 1) X n) codeword 
array obtained after the first stage. In the second stage, the set of all columns of c' is partitioned into contiguous 



33 



sets of size r + 1 each and the i th , 1 < i < (r + 1) row of each partition is cyclically by (i 
As an illustration, the first partition after the cyclic permutation would look like 



ci,i 

C2,r+1 



Cl,2 
c 2,l 



Cl,3 
C2,2 



Cr+1,2 C,.-)-!^ 



-1,4 



Cl,r+1 
C2,r 

Cr+1,1 



1) scalar symbols. 



(108) 



Note that the code has a = (r + 1). In this encoded structure, it is clear that every column of the codeword array 
is locally covered by an [r + 1, r, 2] code, in which the remaining r columns come from the partition to which the 
column belongs to. For example, if the first column of the first partition fails, this column can be recovered by 
accessing the entire contents of all the remaining r columns of this partition and computing the parities. Thus the 
code has (r, 5 = 2)-all-symbol locality. It was also shown that whenever (r + 1) { k' , the minimum distance of the 



code is given by the equality condition in ( |106| ). 

The extension In the extension, the requirement (r + l)\n is replaced by the condition (r + 5 — l)|n. To construct 
a code with (r, 5) -all-symbol locality for the general case 5 > 2, we just use an [r + 5 — 1, r, 6] MDS code as the 
column code in place of the parity-check code, when building the product code. Thus in the second stage, the set 
of columns of c' are partitioned into contiguous sets of size (r + S — 1) each and a similar cyclic permutation is 
carried out as before. It is not hard to verify that this code has (r, 5)-all-symbol locality. We summarize the above 
discussion about the construction, its optimality and rate in the theorem below. 



Theorem 8.5: Given any n, k, r, 5 such that (r + 5 — l)\n, Construction 8.4 yields a vector code with (r, 5)-all 



symbol locality, where the parameter k' in Construction 8.4 is chosen as 

k' 

k'r 



(109) 



The code has a 
by $05} . 



(5 — 1, rate p 



and minimum distance, <i m i n achieving the K-bound with equality, given 



Proof: First of all, note that the parameter k' in (|109[) is chosen such that the code obtained through Construction 



8.4 will have quasi-dimension n. This follows from the fact if k! = 6{r + 5 — 1) + 7, 6 > 0, 0<7<r + <5 — 2, 
then the quasi dimension of the code obtained is given by 



Or + 7, if < 7 < r - 1, 
9r + r, if r < 7 < r + 5 - 



2. 



(110) 



Clearly, since each row of the code is [n, k'] MDS , any n — k' erasures can be tolerated by the overall vector code 
and hence the minimum distance of the code can be lower bounded as 



d„ 



> n - k' + 1 
= n — k + 1 - 



l) (6-1). 



(Ill) 
(112) 



Combining (112) with Theorem 8.1 the claim about the minimum distance follows. 



Remark 9: The following comments are in order regarding Construction 8.4 



1) Construction |8.4| whenever 5 > 2, is an instance where the if-bound on the minimum distance is not always 
achievable. For example, if 5 = 3, r = 4 and k' = 9 = (r + 5 — 1) + 3, then k = r + 3 = 7. But K = k'r = 36 
and hence ^ = ^ = 6 < k. Thus, ( |105| ) is strictly tighter than ( |106| ). 

2) Unlike the optimal codes presented in Theorem |8.3[ the rate of the optimal code obtained via Construction 



8.4 can be less than ^; the parameters given above constitute an example. 
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Appendix A 
Proof of Theorem I3.1I 

We will make use the following lemma (see [7]) in the proof: 

Lemma A.1: Given any set T C [n] such that rank(G|T) < k — 1, we have 

dmin < n- \T\, (113) 
with equality iff T C [n] is of largest size such that Rank(G|r) = k — 1. 



Proof of Theorem 3. 1 



Assume that we are given an [n,k, d m ; n ] scalar code C which has (r, 6) information locality. As in the proof of 
Theorem 5 in Q, we will construct using Algorithm 1 below, a set T C [n] such that rank (G\t) < k — 1 and then 



apply Lemma A.l to get the required result. We define \/i G C, Vi = Col(G\sJ, the column space of the matrix 

G\s r 



Algorithm 2 Used in the Proof of Theorem 3.1 



Let T = { }, j = 
while 1 do 

Pick % e C such that Vi g Col(G| Tj ) 
if Rank (G\ T] us t ) <k-l then 

J=J + 1 
Tj = Tj-i U 5j 
else if Rank (G\t 3 us z ) = k then 

Pick any maximal subset S" of 5j such that rank (G|t,u5"') = k — 1 

StaA = S{ 

3=3 + 1 
Tj = Tj-i U S" 
Exit 
end if 
end while 



With respect to the j th iteration of Algorithm 1, note that as long as Rank (G\t 3 ) < k — 1, one can always pick 
an i G C such that ^ Col(G|r.). Let the algorithm exit after J iterations, i.e., j = J when the algorithm exits. 
Note that necessarily, Rank (G\T J _ 1 uS eBi ) = k. Clearly, as each local code has dimension at most r, it must then 
be true that 



Next, for j £ [J], let 



J > 



(114) 



dim(C| T .) -dim(C| Tj , 



We claim that for j 6 [J — 1], 



Rank (G| Tj ) - Rank (G^; 



> ^- + (5-1). 



This follows because the local codes have minimum distance at least equal to 5 and hence, the last 5 
symbols of a local code are known given the rest of the code symbols. We also have that, 



(115) 

(116) 
1 code 



(117) 
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Summing up, we obtain that 



J 


J 




r, 

/ , b o 


> Viz- 


I 


1=1 


1=1 






J 






> J> 


+ 




1=1 






= k-1 


+ 



-lj (5-1) 



(118) 

(119) 
(120) 



The result then follows from an application of Lemma A. 1 



Appendix B 
Proof of Theorem [32] 

(a) For d m - m to achieve Q, we need that ( 118 ) and ( 1 19 1 must be satisfied with equality. We reproduce the chain 
of inequalities for the case when r \ k here for the sake of convenience: 



\Tj\ = J^sj > J> + (.7-l)((5-l 

j=i j=i 
J 

5>i + 

3=1 



(«) 
> 



k-1 + 



k 

r 

k 



lj (5-1), 
l) (5-1). 



(121) 

(122) 
(123) 



Optimality of the code with respect to the bound on dm- m in Q implies that equality holds in both inequalities, (i) 
and (ii), above. Equality in (i), coupled with the fact that sj > Uj + <5 — 1, 1 < j < J — 1 and sj > vj imply that 

(vj + 5-1, if 1 < J < J - 1, 
if j = J- 

Equality in (ii), coupled with the fact that J > - gives us that 



J 

k 



Thus, 



J-i 



k-1 



5>i + 



/v 
r 

Jr. 



(J - l)r + r - 1. 



1=1 



Coupled with the fact that Vj < r, 1 < j < J — 1 and uj < r — 1, we obtain 

Jr, ifl<i<J-l, 
^ I r - 1, if i = J. 



This leads to 



r + 5-1, if 1 < i < J — 1, 
r — 1 , if i = J. 



(124) 



Also note that since Algorithm [2j can start with any local code, all local codes have length ul = r + 5 — 1 and 
dimension r and hence are [r + 5 — 1, r, S) MDS codes thus proving (a). 
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(b) First of all, note from ( |124[ ) that with the possible exception of the last code picked by the algorithm, all the 
remaining codes must have pairwise, disjoint support. The equality uj = r—1 implies that the increase in dimension 
resulting from replacing S" by S en d would r as opposed to (r — 1) with S". Now if the last code overlapped with 
the union of the rest, since the last 5—1 code symbols of any local code are dependent on its first r symbols, 
the increase in dimension due to S en( i cannot be r. It follows that the last code Sj picked by the algorithm must 
also have support that is disjoint from the support of any of the prior local codes C{, 1 < i < J — 1 picked by the 
algorithm. Thus in summary, all the codes picked by Algorithm [2] must have pairwise, disjoint support. 

It remains only to show that the collection of local codes in C are support disjoint, even when we include local 
codes belonging to C, but not encountered by Algorithm [2] We note first that for £1,^2 £ C, i\ / 12, if 7^ Vi 2 , 
we could pick Ci 1 and Cj 2 (not necessarily in that order) as the first two codes used in Algorithm [2] and hence 
obtain that 

n s i2 = <f>. 

Thus we only have consider the case when two distinct local codes, say Ci ± ,Ci 2 are such that = Vi 2 . If we 
were to run the algorithm, beginning with the code , at the conclusion of the algorithm, we would obtain a set 
Tj such that Rank(G|Tj) = k—1 and \Tj\ = k — 1 + (- — 1)(5 — 1). This follows, because as the code is optimal, 
every instance of the algorithm must yield a set Tj satisfying ( |118| ) and ( |119| ) with equality. Let T be the set of 
all local codes encountered by the algorithm in this case. We could then replace Tj by 

T'j = TjUS i2 , 

and since V{ x = V{ 2 , we would obtain that even with this augmentation of support, Rank(G|^) = k — 1. If 
Si 2 UjgfSj, we would have have that \T'j\ > k — 1 + (- — 1)(5 — 1). But this would imply a tighter bound on 
^min> which would contradict our assumption that the code under consideration satisfies the earlier bound on <i m i n - 
On the other hand, if Si 2 C Uj^j-Sj, then we can start the algorithm with Cj 2 , and pick the same sequence of codes 
we picked when we started the algorithm beginning with C{ x . This can be done as V% x = V{ 2 . But this would lead 
us to conclude that the support Si 2 of the code C{ 2 and 

u jeT Sj \ Si t , 

were disjoint, which by Si 2 C Uj^cSj, would then force 

Si x = Si 2 . 

But this would imply that = Cj 2 contradicting our earlier assumption that the codes were distinct. It follows 
that all the local codes Ci,i S C have disjoint supports. 

(c) To prove the third assertion in the theorem, first of all note that if ii,i2, ■ ■ ■ ,it G are such that Cj. is picked 
by the Algorithm [2] in the f h step, then it must be that 

^.nfexj=0 (125) 

simply because 

dim I y^VL I = k = rt. 

and dim(Vi) < r,V i € C. It remains to be proved that the assertion is true for any set of ordered set of indices 
H,i2, - ■ ■ iH belonging to C. If we can show that it is possible for the algorithm to proceed in such a way that Cj. 
is the local code picked in the jth step, then from ( |125| ), the assertion would be proved. 



Assume that we are given an ordered set of indices i\, 12, ■ ■ ■ ,it € C and that it is not possible for the algorithm 
to proceed in such a way that Cj. is the local code picked in the jth step. Let m the first index at which the 
algorithm runs into trouble, i.e., the algorithm is unable to pick the code Ci m during the mth iteration. This can 
happen only if 



m— 1 
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Suppose next, that the algorithm were allowed to proceed beyond the (to — l)th step by picking indices without 
restriction (as opposed to picking them from the given sequence {i m +i, i m +2, -h}) until eventually a set Tj was 
obtained such that Rank(G| Tj ) = k - 1 and \Tj\ = k - 1 + (t - 1)(8 - 1). We could then replace Tj by 

T'j = TjUS lm , 

while maintaining the property that Rank(G|T;) = k — 1. From part (a) of the theorem we have that all local codes 
are support disjoint and it follows therefore that 

\T'j\ = \Tj\ +r + 8 - 1 > k - 1 + (t - 1)(<5 - 1). 

This would imply a tighter bound on d m i n and we have once again arrived at a contradiction. It follows that given 
any set of indices i\, i^, ■ ■ ■ , it € A it is possible for the algorithm to proceed in such a way that Cj. is the local 
code picked in the step. This concludes the proof. 

Appendix C 
Proof of Theorem |3.4| 

The proof here is similar to the proof of Theorem 17 of Q. We will state a couple of definitions and a lemma 
from 0, which will be useful in proving the existence of optimal codes with all-symbol locality. 

Definition 10 (k-core [7]): Let L be a subspace of F™ and S C [n] be a set of size k. S is said to be a fc-core 
for L if for all non-zero vectors v G L, Supp(v) ^ S. 

In our application to codes, L will frequently denote the dual C L of a linear code C of length n. In this setting, 
we note that saying that S is a fc-core for C 1 - is equivalent to saying that the k columns of the generator matrix of 
the code C corresponding to S are linearly independent. 

Definition 11 (Vectors in General Position Subject to L 17]): Let L be a subspace of F^. Let G = [gi, • • • ,g n ] 
be a (k x n) matrix over ¥ q . The columns of G, {gi}f =1 are said to be in general position with respect to L if: 

• Row space of G, denoted by Row(G) C L L . 

• For all fc-cores S of L, we have Rank^GIs) = k. 

Lemma C.l (Lemma 14 of [7 ']): Let n,k,q be such that q > kn k . Let L be a subspace of F™ and < k < 
n — dim(L). Then 3 a set of vectors {gi}f =1 in F^ that are in general position with respect to L. 

Using the above lemma, we will now prove the existence of optimal (r, 5) codes when (r + 8 — 1) | n. 

Let n = (r + 5 — l)t. Let {Pi, • • • , P t } be a partition of [n], where |Pj| = r + 8 — 1, 1 < i < t. Let Qi be the 
parity check matrix of an [r + 8 — 1, r, 8] MDS code with support Pj. Consider the block-diagonal matrix 



H't(6-l)xn 



Ql 



(126) 



Qt_ 

Let L = Rowspace(i7 / ). For any [n, k, d m i n ] code with (r, 8) all-symbol locality, (|4]) along with the fact that for 
any (r, 8) all-symbol locality code, one has that 

dmin - 8 > 0, 

gives us that 

n — k > 



(«-l) >-(*-!)■ 
r 



Rearranging the above equation, we get 

k <n — t(8 — 1), (127) 



which by Le mma C.l tells us that the first requirement for the existence of a fe-core has been met. Thus, from 
Lemma C.l for large enough field size, 3 {gi}r=i> gi G F^ which are in general position with respect to L. Now 
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consider the code C whose generator matrix G^xn = [gi'''gn]- Clearly, C is an [n,k] code. Also C has (r, 5) 
all-symbol locality, as each co-ordinate of C has an [r + 5 — 1, < r, > S] punctured code checking on it, whose 
parity check matrix contains one of Qi ; 1 < i < t. 

It remains to prove that d m i n (C) = d m i n is given by the equality condition in Q. Towards this, we will show 
that for any set S C [n] such that Rank^GIs) < k — 1, it must be true that 



.k. 



\S\<k-l + (S-l)i\'^]-l\. (128) 
Assuming this to be the case, it follows that the minimum distance of the code C satisfies 

d min = n- max \S\ > n — k + 1 —([-']— 1] (5 — 1). (129) 

SC[n] \ r ) 

Rank(G| s )<fc-l 

Combining the above equation and Q, it follows that the code C has the distance given in the theorem statement. 

;t S C [n] be such that 1 
that if S were such thi 

\SHPi\ < r Vie [t], 



It remains to prove ( 128 1. Towards this, let S C [n] be such that Rank(G|5) < k — 1. Clearly, S 1 does not contain 
a fc-core, A; being the dimension of C. Note that if S were such that 



it would then follow that S contains a fc-core. Thus there exists some i G [t] such that |Pj n 5| > r + 1. 
Define 

6 £ := |{i e [t}\ \PinS\ =r + £}\ 1<£<S-1. 

For 1 < £ < 5 — 1, consider the set, Si, obtained from S by dropping £ elements of S from each of the hi sets 
{Pi\ \PiH S\ = r + £}. Clearly, the set C\i<i<s-iSi is an |5*| — 6i — 262 — • • • — (5 — 1)65-1 core contained in S 
and thus as S does not contain a fc-core, 

5-1 

|5| - (5 - 1)(£ h) <\S\-h-2b 2 (5- l)b s -i <k-l. (130) 



i=l 



Also if we pick r co-ordinates from each Pi which is such that |Pj fl S\ > r + 1, we get a (r)(X)i=i 6j)" core 
contained in S. Thus as S* does not contain a A;-core, 



5-1 



j=l 



fc - 1 






r 




r 



Combining ( 130) and ( 131 ), we have 



\S\<k-l + 



10-1) 



(131) 



(132) 



Appendix D 
Proof of Theorem [63] 

We begin with a useful lemma. 

Lemma D.l (Combinatorial Nullstellensatz (Thm. 1.2 of H38\l )): Let F be a field, and let / = f(x\, . . . , x n ) be a 



polynomial in ¥[x\, . . . , x n ]. Suppose the degree deg(/) of f is expressible in the form ti, where each ti is a non- 

8=1 

n 

negative integer and suppose that the coefficient of the monomial term xf in / is nonzero. Then, if S\ , . . . , S n 



i=l 



are subsets of F with sizes \Si\ satisfying |Sj| > ti , then there exist elements si € Si, s 2 € /S2, . . . ,s n E S n such 
that 



/(si,s 2 ,...,s„) / 0. 
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Set 



v = mr + (m — 1)(<5 — 1), 
A = n — mriL- 



Let Cl be an ((riL,r,d), (a,/?)) MSR code and let Gl be an (ra x n^a) generator matrix for Cl- Let m > 2 be 
an integer. Consider the vector code C, generated by the (mra x net) generator matrix 



G 



G L 



G L 



Q 



(133) 



in which Q is an (mra x Aa) matrix over ¥ q and the block matrix Gl appears m times along the diagonal. It is 
clear that the code C has length n and also has m support-disjoint local MSR codes, each generated by Gl- Thus, 
C is an MSR-local code with (r,5) information locality. As a result, the right hand side of (92i with K = mra, 
is an upper bound on the minimum distance, d m ; n of C. 

We will now show that it is possible to pick the matrix Q such that the minimum distance of C is indeed given 
by ( |92] ). Towards this, we treat the entries of Q as indeterminates, with the (i,j') th element of Q denoted by xy. 
We write G(X) to indicate that the generator matrix is a function of the matrix X = [xy] and G(Q), its evaluation 
atX = Q. 

The expression in (|92~]) for d m i n can be rewritten in the form 



(L 



n 



mr + 1 — (to — 1) (5 — 1), 



(134) 



so that n — d m - m + 1 = mr + (m— 1) (8— 1) = v. It follows that it suffices to show that all the (K x ua) sub-matrices 
of G(Q) that are obtained by selecting a set of v thick columns drawn from the generator matrix G, are of full 
rank. 

Let Si denote the support of the i th local code, i S [to]. If the v thick columns have indices chosen from U™ 
then it can be shown that the full rank condition is always satisfied simply because, one is forced to pick at least 
r thick columns with indices from the support Si of each local code. It follows that it is enough to ensure that 



rank(G(Q)| T ) 
where T 



K, VTeT, 
{T C [n] : |T| 



mr. 



\THSi\ <r, ViG [to]}. 



(135) 
(136) 



Note that G(Q)\t is square (K x K) matrix. Next, consider the set of polynomials /r(X) = det(G(X)|r), T G T, 
where det(A) denotes the determinant of the square matrix A. Also, let /(X) = flrer /r(X). The degree of any 
individual indeterminate Xy in /(X) is at most \T\- Noting that 



\r\ < 



n 

mr 



by applying Lemma |DT we conclude that there exists a matrix Q such that f(Q) / 0, whenever the base field 
¥ q has size g > 



Appendix E 
Proof of Theorem |6.6I 



The proof is similar to the proof of Theorem 3.4 Let the integer I be defined from 



K 



at 



The idea is to first construct a partial parity-check matrix consisting of to disjoint local parity matrices, which 
ensures that the locality constraints are satisfied. Then we show that one can always add extra rows to this partial 
parity-check matrix in order to guarantee the optimum minimum distance. 
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Let Hi denote the parity check matrix of an ((rt£, r, d), (a, /3)) MSR code Cl- By Lemma |43| we know that Cl 
is vector-MDS. Thus the dual code C^ , generated by Hl, will also be vector-MDS. Consider the (m(<5 — l)a x na) 
matrix Hq given by 



H n 



(137) 



in which the matrix Hl appears m times along the diagonal. Also, let Co denote the code whose parity check matrix 
is Hq. Next, consider the code C whose parity check matrix H is obtained by augmenting Hq with additional rows 
as shown below: 



H 



Hq 
Hi 



(138) 



where Hi is an ((n — £ — m(6 — l))a x na) matrix. As a result, H is an ((n — £)a x na) matrix. Note that under 
optimality, n — £ > m{5 — 1) (since d mm > 5 for a code with (r, 5) information locality). 

Let Cq, C 1 - denote the dual codes of Co, C respectively. Thus C^C 1 - are the row spaces of Hq,H respectively. 
It is clear that C has (r, 5) locality with each local code being a sub-code of an MSR code. 

Let S C [n] such that |5| = v be referred to as a z^-core of Cq if V Co G Cq, supp(co) ^ S. We will now show 
that if the matrix Hi is selected in such a way that any S which is an £-core of Cq is also an £-core of C -1 , then 
the minimum distance of C will be given by ( |93"T ). We will subsequently show by appealing to Lemma D.l that it 
is always possible to pick Hi such that the above condition is met. 

Let Si,i G [m] denote the disjoint supports of the m local codes of Co (and hence of C as well). Clearly, S is 
an £-core of Cq if and only if \S n Si\ < r, V i G [m]. We note that any T C Si, \T\ < r can be extended to an 
£-core of Cq. It is also clear that the code C 1 - when shortened to Si has Hl as a submatrix of its generator matrix. 

Let the matrix Hi be such that any S which is an £-core of Cq is also an ^-core of C -1 . This has the following 
implication: the code C L when shortened to Si has generator matrix Hl, for otherwise, if it were to contain one 
or more additional rows, we would be able to find a code-word c' of C L that does not belong to Cq and that is 
supported on T C Si, \T\ < r. But this would then contradict the assumption that any S which is an £-core of Cq 
is also an ^-core of C -1 , as £ > r. We thus conclude that C\s t is an MSR code whose parity check matrix is given 
by Hl- Thus the code C is an MSR-local code with (r, 5) all-symbol locality (we were able to assert earlier only 
that each local code is a sub-code of an MSR code). 

Let G denote the generator matrix of C. Note that if S is any £-core of Cq (and hence of C 1 - as well), it must 
be that Rank^ls) = £a = K, because this means that there can not be any dependencies in the £ thick columns 
of G\ s . 

We continue under the assumption as above, that any S which is an ^-core of Cq is also an £-core of C -1 . 
Next, let T C [n] of size |T| > £ be such that Rank(G|r) < K (Such a T can be constructed by ensuring that 
|Tn5i| > (r + 1) for some i). Clearly, T does not contain any ^-core of C 1 - (and hence does not contain any £-core 
of Cq as well), which implies that at least for some i G [m], \TC\Si\ > (r + 1). Let the integers bj, 1 < j < (5 — 1) 
be defined as follows: 



\{ie[m] : \SinT\=r+j}\. 



Also, define the sets Qi,i G [m] as follows: 

Qi = 



Tns u 

(any) QjcTfl S t , s.t. |Qj| 



if \TnSi\ < r 
r, if \TnSi\ > r. 



(139) 



(140) 



Also, let Q = U^Qi. Note that \Q\ = \T\ - ^=1 jbj- Clearly, the set Q is a |Q|-core of C x and 

'5-1 \ 5-1 



m-(*-i) (5> <\T\-Y,i h 3 = \Q\ ^ 

J=l J 3=1 



1. 



(141) 
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Next, let M = {i € [m] : |5» n T\ > r + l}and note that \M\ = 
Siji G M, we will then obtain an (|M|r)-core. Thus we have that 

5-1 

£». 



< 



Combining ( | 14 1 1 ) and ( |142[ ), we get that 

\T\ 



< 



£-1 




'£' 


r 




r 


! + (<$- 


-1) 


([ 



1. 



If we pick r elements from each set 



(142) 



K 



a 



" K~ 






-0 


ra 





(143) 
(144) 



It follows from Lemma 



that 



<L 



> 



n 



K 



a 



! + (*-!) 



ra 



1 



and then from the K-bound ( |81j ) that the code C has minimum distance equal to 

K 



cL 



n 



o 



+ 1 + (<5 - 1) 



if 



1 



(145) 



It remains to be proved that one can pick a matrix H\ such that any S which is an ^-core of Cq- is also an ^-core 
of C -1 . Towards this, consider a set S such that |5| = £ and let S° denote the set [n]\5". Note that S is an £-core 
of C 1 - if and only if the square matrix H\s<= is full rank, i.e., det (H\gc) / 0. Now, we need to pick Hi such that 
for all S C [n], S an ^-core of Cq, det(H\gc) / 0. This can be done using a similar technique as in Theorem 



6.2 



where we used Lemma D.l to pick the matrix Q in ( |133| ). One can show that there exists a matrix H\ such that 
H\s<= is full rank for all S, £-core of Cq. Here also we take take H\ to be a matrix of indeterminates and each 
^-core of Cq, gives us a determinant and hence a non-zero polynomial whose evaluatio n mu st be nonzero. Noting 
that Cq has at-most (") 



6.2 



cores and using the Combinatorial Nullstellensatz of Lemma 
we conclude that such H\ can be picked if the field size q > (") . 



D.l 



as we did in Theorem 



Appendix F 
Proof of Theorem I7.4I 

All claims in the theorem are clear with the exception of the claim concerning the minimum distance. Since 
Kl I K, an upper bound on d m ; n from (94 1 is given by 

K ( K 

r + 1 - 

K L r \K L 

= m(r + 5-l)-£r + l-(£-l)(5-l), (147) 
= (m-£)n L + 5. (148) 
It suffices to show that any pattern of 5 + (m — £)riL — 1 erasures can be corrected by the code. Towards this, we 



dm 



< n 



1 



(5-1) 
-1)(S-1), 



(146) 



D n 



(m- £)N L + A L = (m-£) 



+ 1. 



note that the scalar code A employed in Construction |7.3| has minimum distance given by 

Now we argue that when any pattern of (m — £)til + 5 — 1 vector code symbols are erased, this leads to the 

1 scalar code symbols. This would imply that the code, A, can recover from this many 



erasure of at most D r , 



erasures and hence, so can the vector code C. 



As in the proof of Theorem 7.2 for a given pattern of (m — l)riL + 5 — 1 vector code symbol erasure, let 
7i) 1 < i < m be the number of code-word symbols erased from the i th local code among these symbols. Note that 
0<7i<<5 — 1 < ri£, 1 < i < m and Y^iLi 7« = ( m ~ ^) n L + 5 — 1. Thus the number of scalar code symbols lost 
by the code in this pattern of erasures, L has to be that 

<{m-£)\ : j + 



L 



m 

E 

i=l 



2 



5-1 
2 



A, 



where we have used the fact that 



+ 



< 



). The result follows. 
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Appendix G 
Proof of Theorem I8.1I 



We will make use of the following two facts and Lemma 5.3 to prove the theorem. Their proofs are straightforward 
and are hence omitted. 

Lemma G.l: Consider two sets Si and S2 such that Si C 52 C [n], and 

q-dim(C|s 2 )-q-dim(C| 5l ) 4 Av > 0. (149) 

Then, if I is any minimum cardinality information set for C\s 2 , then it must true that \ZO (S%\S\)\ > Au. 

Lemma G.2: Consider two sets Si and 52, such that 5i C 52 C [n], and rank(G|5 1 ) = xwk(G\s 2 ) = K. Then 
q-dim(C| Sl ) >q-dim(C| 52 ). 

We will assume in the proof, that we are given an [n, K, d m i n , a] code C which has (r, 5) information locality. 
We will construct a set T C [n] such that ranker) < K using Algorithm [jj the same Algorithm which is used 
in the proof of Theorem 5.1 ), and then apply Lemma 5.3 to get the required result. 

Let the algorithm exit after J iterations, i.e., j = J when the algorithm exits. Let 5" = Tj-i U Si, where i is 
the index picked in the J th iteration. Note that necessarily, Rank (Gig") = K. 

Let T 1 denote a minimum cardinality information set for C\s». Clearly, it must be true that 

IX' I 



Next, for j 6 [J], let 



We claim that for j € [J - 1], 



J > 



Mil - l J i-i| 1 

q-dim(C|T 3 ) - q-dim^lT^J- 



(150) 



Sj > Vj + (5 - 1) 



(151) 



(152) 



To see this, first note whenever we pick i G £ such that V{ ^ X^eT We, since d m ; n (CI5J > 5 — 1, it must be true 
that Sj > 1 + (5 — 1) = 5. Also, whenever uj > 0, Lemma |G~T implies that ( 152| ) must be true and thus we see 
that ( |152[ ) is true always. We also have that 

Summing up, we obtain that 



sj > vj. 



\Tj\ 



T 



1=1 



1=1 
j 



> 



> 



1=1 
fix' 1 



1) + 



(5-1 




where ( |155[ ) follows from ( |150| ) and ( |156| ) follows by noting that 

./ 



1=1 



q-dim(C| Tj ) > |X'| - 1, 



(153) 

(154) 

(155) 
(156) 

(157) 



which is because of the maximality of 5 en d in Sj (i.e., even adding one more element of Si to 5 en d in step 
8 of Algorithm 1 would result in an accumulated rank of K and thus q-dim(C|Tj) > W\ — !•) Now, since 
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rank(G|^ J ) < K, Lemma 5.3 can be applied to give that 

dmin < n-\l'\ + l- 

< n-\l \+l- 



r 

M 

r 



1J (5-1), 
l) (5-1). 



(158) 
(159) 



where, as S" C \J ie ^Si and thus Lemma G.2 we have that > leading to 159 where Zq is as defined in 
The bound in ( 105 1, then follows from Lemma G.2 Further, since k > |~— ], (105 1 can be upper 



.1 



Theorem 



bounded as follows: 



< n — k + 1 





~K~ 


+ -( 


"1 


K~ 


< n — 










a 
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a 




~K~ 


-( 


' K 




= n — 










a 




rot 





l){5-l) 

l) (* - 1) 



where the last equation follows since |~1 \%~\~\ = \^\ - This concludes the proof of the theorem. 



